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1  INTRODUCTION 

Future  intelligence  services  and  armed  forces  will  be  increasingly  reliant  on 
distributed  swarms  of  smart  devices.  While  some  of  these  intelligent  swarms  are 
already  operational  —  teams  of  UAVs,  groups  of  robots,  networks  of  sensors  — 
deployment  of  others,  such  as  self-assembled  and  self-reconfigurable  structures, 
smart  materials,  medical  nano-robots,  is  ten,  twenty  or  more  years  in  the  future. 

Although  individual  units  in  the  swarms  mentioned  above  have  various  lev¬ 
els  of  individual  complexity  and  size,  swarms  share  common  characteristics  and 
control  issues:  namely,  ability  to  function  autonomously  and  robustly  in  un¬ 
certain  dynamic  environments  with  a  high  probability  of  component  failure. 
Individual  elements  in  a  swarm  also  have  limited  and  faulty  sensing  and  com¬ 
munication  capabilities,  and  often  need  to  coordinate  to  achieve  a  global  goal 
despite  the  highly  heterogeneous  and  highly  distributed  nature  of  the  system.  Fi¬ 
nally,  individual  components  should  be  endowed  with  rudimentary  intelligence 
and  learning  abilities  in  order  to  enhance  the  adaptability  and  capabilities  of  the 
swarm.  The  UAV  example  provides  an  illustration  for  why  these  capabilities  are 
necessary.  Although  some  UAVs  are  individually  controlled,  as  bigger  swarms 
are  deployed  for  wider,  more  detailed  surveillance,  remote  control  is  no  longer 
feasible,  and  UAVs  have  to  operate  autonomously.  Distributed  decision  making 
is  thus  critical  to  ensure  robust  coordination  among  UAVs,  and  to  ensure  mis¬ 
sion  continuation  even  as  individual  UAVs  return  for  refueling  or  delegate  their 
tasks  for  other  reasons. 

The  issues  listed  above  pose  fundamental  challenges  to  the  design  of  robust, 
scalable  swarm  control  algorithms.  The  challenge  is  made  even  greater  by  the 
fact  that  one  does  not  control  the  collective  behavior  of  the  swarm  directly  — 
rather,  it  emerges  out  of  interactions  among  individual  components  and  between 
components  and  the  environment. 

Over  the  course  of  this  project  we  have  developed  a  mathematical  framework 
for  studying  collective  behavior  of  multi-agent  swarms  (MAS).  This  framework 
will  allow  the  MAS  designer  to  rationally  specify  and  optimize  individual  control 
algorithms  that  will  lead  to  the  best  collective  behavior.  Our  mathematical 
approach  has  enabled  us  to  model  and  quantitatively  analyze  collective  behavior 
of  different  classes  of  agents: 

•  Simple  agents  using  reactive  control:  agents  decide  about  future  actions 
based  solely  on  input  from  sensors  (including  communication  with  other 
agents)  and  the  action  they  are  currently  executing.  Such  agents  can  be 
represented  as  stochastic  Markov  processes  [26,  29]. 

•  Next,  we  extended  the  formalism  to  adaptive  agents  that  change  their 
behavior  based  on  observations  of  the  environment.  These  agents  can  be 
represented  by  a  generalized  Markov  process  of  order  in,  where  m  is  the 
number  of  observations  used  [24,  25] . 

•  We  also  created  a  framework  to  study  spatially  inhomogeneous  systems 
where  agents  are  interacting  with  spatially  extended  fields,  for  example, 
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diffusing  pheromones  [11]. 

We  showed  that  an  equation,  known  as  the  Rate  Equation,  describes  the 
dynamics  of  the  collective  behavior  of  swarms.  The  Rate  Equation  formalism 
can  be  derived  from  theory  of  stochastic  processes  [26],  although  in  practice, 
the  models  are  usually  phenomenological.  The  Rate  Equation  approach  has 
been  applied  to  study  distributed  systems  of  reactive  robots  [43,  27,  23,  30,  1]. 
We  formalized  the  approach  and  extended  it  to  adaptive  agents  [25,  28]  as  well 
agents  interactive  through  external  fields  [11].  Below  we  review  the  elements 
of  the  mathematical  formalism  for  reactive  (Section  1.2),  adaptive(Section  1.3) 
and  spatially  interacting  (Section  1.4)  agents  and  illustrate  with  a  few  results 
from  the  robotics  domain  (Section  3.2,  Section  3.1,  and  Section  3.3). 

1.1  Background 

Mathematical  models  can  generally  be  broken  into  two  classes:  microscopic  and 
macroscopic.  Microscopic  descriptions  treat  the  agent  as  the  fundamental  unit 
of  the  model.  These  models  describe  the  agent’s  interactions  with  other  agents 
and  the  environment.  Solving  or  simulating  a  system  composed  of  many  such 
agents  gives  researchers  an  understanding  of  the  global  behavior  of  the  system. 
Examples  of  such  microscopic  models  are  reported  in  [31,  16].  Rather  than 
compute  the  exact  trajectories  and  sensory  information  of  individual  agents, 
their  behavior  is  modeled  as  a  series  of  stochastic  events,  with  probabilities 
determined  by  simple  geometric  considerations  and  systematic  experiments  with 
small  groups  of  agents.  Running  several  series  of  stochastic  events  in  parallel, 
one  for  each  agent,  allows  researchers  to  study  collective  MAS  behavior. 

A  macroscopic  model,  on  the  other  hand,  directly  describes  collective  MAS 
behavior.  It  is  computationally  efficient  because  it  uses  fewer  variables.  These 
models  have  been  successfully  applied  to  a  wide  variety  of  problems  in  physics, 
chemistry,  biology  and  the  social  sciences.  In  these  applications,  the  microscopic 
behavior  of  an  individual  (e.g.,  a  Brownian  particle  in  a  volume  of  gas  or  an 
individual  residing  in  US)  is  quite  complex,  often  stochastic  and  only  partially 
predictable,  and  certainly  analytically  intractable.  Rather  than  account  for  the 
inherent  variability  of  individuals,  scientists  model  the  behavior  of  some  average 
quantity  that  represents  the  system  they  are  studying  (e.g.,  volume  of  gas  or 
population  of  US).  Such  macroscopic  descriptions  often  have  a  very  simple  form 
and  are  analytically  tractable.  It  is  important  to  remember  that  such  models 
do  not  reproduce  the  results  of  a  single  experiment  —  rather,  the  behavior  of 
some  observable  averaged  over  many  experiments  or  observations.  The  two  de¬ 
scription  levels  are,  of  course,  related:  we  can  start  from  the  Stochastic  Master 
Equation  that  describes  the  evolution  of  a  agent’s  probability  density  and  get 
the  Rate  Equation,  a  macroscopic  model,  by  averaging  it  [26].  In  most  cases, 
however,  Rate  Equations  are  phenomenological  in  nature,  i.e.,  not  derived  from 
first  principles.  However,  we  have  developed  a  “recipe”  that  allows  one  to  formu¬ 
late  the  Rate  Equations  describing  dynamics  of  a  homogeneous  MAS  composed 
of  reactive  agents  simply  by  examining  the  details  of  individual  agent  controller. 
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1.2  Stochastic  Approach  to  Modeling  Multi-agent  Sys¬ 
tems 

The  behavior  of  individual  agents  in,  for  example  a  robotic,  swarm  has  many 
complex  influences,  even  in  a  controlled  laboratory  setting.  Robots  are  influ¬ 
enced  by  external  forces,  many  of  which  may  not  be  anticipated,  such  as  friction, 
battery  power,  sound  or  light  signals,  etc.  Even  if  all  the  forces  are  known  in 
advance,  the  robots  are  still  subject  to  random  events:  fluctuations  in  the  en¬ 
vironment,  as  well  as  noise  in  the  robot’s  sensors  and  actuators.  A  robot  will 
interact  with  other  robots  whose  exact  trajectories  are  equally  complex,  mak¬ 
ing  it  impossible  to  know  which  robots  will  come  in  contact  with  one  another. 
Finally,  the  designer  can  take  advantage  of  the  unpredictability  and  incorporate 
it  directly  into  the  robot’s  behavior:  e.g.,  ,  the  simplest  effective  policy  for  ob¬ 
stacle  avoidance  is  for  the  robot  to  turn  a  random  angle  and  move  forward.  In 
summary,  the  behavior  of  robots  in  a  swarm  is  so  complex,  it  is  best  described 
probabilistically,  as  a  stochastic  process. 


Figure  1:  Diagram  of  a  robot  controller  for  the  simplified  foraging  scenario 


Consider  Figure  1  —  a  controller  for  a  simplified  foraging  scenario.  Each 
box  represents  a  robot’s  state  —  the  action  it  is  executing.  In  the  course  of 
accomplishing  the  task,  the  robot  will  transition  from  searching  to  puck  pick-up 
to  homing.  Transitions  between  states  are  triggered  by  external  stimuli,  such 
as  encountering  a  puck.  This  robot  can  be  described  as  a  stochastic  Markov 
process1,  and  the  diagram  in  Figure  1  is,  therefore,  the  Finite  State  Automaton 
(FSA)  of  the  controller. 

The  stochastic  processes  approach  allows  us  to  mathematically  study  the 
behavior  of  robot  swarms  and  other  multi-agent  systems.  Let  p(n,t)  be  the 
probability  a  reactive  robot  is  in  state  n  at  time  t.  The  Markov  property  allows 
us  to  write  change  in  probability  density  as  [26] 

A  p(n,t)  =  p(n,t  +  At)  —  p(n,t) 

=  ^^p(n,t  +  At\n' ,t)p(n' ,t)  —  ^2p(n',t  +  At\n,t)p(n,t).  (1) 

n'  n' 


The  conditional  probabilities  define  the  transition  rates  for  a  Markov  process 


W(n\n t) 


p(n,t  +  At\n ,t) 

iim  - t - 

At— >o  At 


(2) 


Markov  process’s  future  state  depends  only  on  its  present  state  and  none  of  the  past 


states. 
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The  quantity  p(n,  t)  also  describes  a  macroscopic  variable  —  the  fraction  of 
robots  in  state  n,  with  Eq.  (1)  describing  how  this  variable  changes  in  time.  Av¬ 
eraging  both  sides  of  the  equation  over  the  number  of  robots  (and  assuming  only 
individual  transitions  between  states  are  allowed),  we  obtain  in  the  continuous 
limit  (limAt->o) 


dNn{t) 

dt 


W(n'\n,t)Nn(t ), 


(3) 


where  Nn(t)  is  the  average  number  of  robots  in  state  n  at  time  t.  This  is  the 
so-called  Rate  Equation.  It  is  sometimes  also  written  in  a  discrete  form,  as 
a  finite  difference  equation  that  describes  the  behavior  of  N(kT),  k  being  an 
integer  and  T  the  discretization  interval:  (N(t  +  T )  —  N(t))/T.  Eq.  (3)  has  the 
following  interpretation:  the  number  of  robots  in  state  n  will  increase  in  time 
due  to  transitions  to  state  n  from  other  states,  and  it  will  decrease  in  time  due 
to  the  transitions  from,  state  n  to  other  states. 

The  Rate  Equation  is  a  useful  tool  for  mathematical  analysis  of  collective 
dynamics  of  robot  swarms.  To  facilitate  the  analysis,  we  begin  by  drawing 
the  macroscopic  state  diagram  of  the  system.  The  collective  behavior  of  the 
swarm  is  captured  by  an  FSA  that  is  functionally  identical  to  the  individual 
robot  FSA,  except  that  each  state  of  the  automaton  now  represents  the  number  of 
robots  executing  that  action  [27,  23,  30].  Not  every  microscopic  robot  behavior 
need  to  become  a  macroscopic  state.  In  order  to  keep  the  model  tractable, 
it  is  often  useful  to  coarse-grain  it  by  considering  several  related  actions  or 
behaviors  as  a  single  state.  For  example,  we  may  take  the  searching  state  of 
robots  to  consist  of  the  actions  wander  in  the  arena,  detect  objects  and  avoid 
obstacles.  When  necessary,  the  searching  state  can  be  split  into  three  states, 
one  for  each  behavior;  however,  we  are  often  interested  in  the  minimal  model 
that  captures  the  important  behavior  of  the  system.  Coarse-graining  presents 
a  way  to  construct  such  a  minimal  model. 

The  macroscopic  automaton  can  be  directly  translated  into  the  Rate  Equa¬ 
tions.  Each  state  in  the  automaton  becomes  a  dynamic  variable  Nn(t),  with  its 
own  Rate  Equation.  Every  transition  will  be  accounted  for  by  a  term  in  the 
equation:  a  positive  term  for  the  incident  (W(n\n')Nn>)  arrows  and  negative 
term  for  the  outgoing  (W(n'\ri)Nn)  arrows. 

Finding  an  appropriate  mathematical  form  for  the  transition  rates  is  the 
main  challenge  in  studying  real  systems.  The  transition  is  triggered  by  some 
stimulus  —  be  it  another  robot  in  a  particular  state,  an  object  to  be  picked 
up,  etc.  In  order  to  compute  the  transition  rates,  we  assume,  for  simplicity, 
that  robots  and  stimuli  are  uniformly  distributed.  The  transition  rates  then 
have  the  following  form:  W(n\n!)  ss  M,  where  M  is  the  environmental  stimulus 
encountered  (e.g.,  ,  number  of  sticks  in  the  arena).  The  proportionality  factor 
connects  the  model  to  experiments,  and  it  depends  on  the  rate  at  which  a  robot 
detects  sticks.  It  can  be  roughly  estimated  from  first  principles  (“scattering 
cross  section”  approach  [27]),  measured  from  simulations  or  experiments  with 
one  or  two  robots,  or  left  as  a  model  parameter.  There  will  be  cases  where  the 
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uniformity  assumption  fails:  e.g.,  ,  in  overcrowded  scenarios  where  robots,  de¬ 
pending  on  their  obstacle  avoidance  controller,  tend  to  clump,  forming  “robotic 
clouds”  [30].  If  the  transition  rates  cannot  be  calculated  from  first  principles, 
it  may  be  expedient  to  leave  them  as  parameters  of  the  model  and  obtain  them 
by  fitting  the  model  to  data.  We  illustrate  the  approach  in  the  sections  below 
by  applying  it  to  study  foraging  (Section  3.2)  and  collaboration  (Section  3.1)  in 
multi-robot  systems. 

1.3  Modeling  Adaptive  Agents 

If  each  agent  had  instantaneous  global  knowledge  of  the  environment  and  the 
state  of  other  agents,  the  system  could  dynamically  adapt  to  any  changes. 
In  most  situations,  such  global  knowledge  is  impractical.  However,  for  suffi¬ 
ciently  slow  dynamics,  agents  can  correctly  estimate  the  state  of  the  environment 
through  repeated  local  observations  (by  storing  them  in  memory).  We  devel¬ 
oped  a  theory  of  such  a  memory-based  adaptation  mechanism  [24,  25],  which 
is  outlined  here.  Let  p(n,  t)  be  the  probability  an  agent  is  in  state  n  at  time 
t.  We  note  that  for  a  homogenous  system  of  independent  and  indistinguishable 
agents,  p(n,  t)  describes  the  macroscopic  state  of  the  system,  since  it  is  simply 
the  fraction  of  agents  in  the  state  n.  Let  us  assume  that  the  agents  use  a  finite 
memory  of  length  m  of  the  past  of  the  system  in  order  to  estimate  the  present 
state  of  the  environment  and  make  decisions  about  future  actions.  Then  the 
evolution  of  the  system  can  be  represented  as  a  generalized  Markov  processes 
of  order  m.  This  means  that  the  state  of  an  agent  at  time  t  +  At  depends  not 
only  on  the  configuration  of  the  system  at  time  t  (as  in  simple  Markov  systems), 
but  also  on  configurations  at  times  t  —  At,  t  —  2Af,  . . .,  t  —  (m  —  l)Af,  which 
we  refer  to  as  history  h  of  the  system.  In  the  derivation  below  we  will  employ 
the  following  identities:  p(n,  t  +  At\h)  =  ]T)n,  p(n,  t  +  At\n',  t;  h)p(n' ,  t\h)  and 

T,nP(n’t  +  h)  =  1. 

Let  us  introduce  the  probability  distribution  function  over  the  histories  (for 
a  homogenous  system  this  distribution  is  the  same  for  all  the  agents):  p(h,t), 
1  =  J2heHP(hit),  where  H  is  the  set  of  all  feasible  histories  (if  it  continuous, 
one  should  use  integration  instead  of  summation  for  proper  normalization).  We 
can  then  write  for  the  change  in  probability  density  A p  is: 

A p(n,  t)  =  p{n,  t  +  At)  -  p(n,  t)  (4) 

=  ^2[p(n,t  + At\h)-p(n,t\h)\p(h) 

h 

=  EE  p(n,  t  +  At\n' ,  t;  h)p(n',  t\h)p(h,  t) 

h  n' 

-  EE  p(nr,  t  +  At\n,  t;  h)p(n,  t\h)p(h,  t) 

h  n' 

In  the  continuum  limit,  as  At  — >  0,  Ap/ At  can  be  written  as 
Mn^t)  =  EE 

W{n\n'\  h)p{n' ,  t\h)p(h,  t)  (5) 

h  n' 


5 


with  transition  rates 


-EE  W(n'\n\  h)p(n ,  t\h)p(h,  t) , 

h  n' 


W{n\n'\  h) 


p(n,  t  +  At \n‘ ,  t;  h) 

lim  - - - 

At^O  At 


(6) 


In  the  most  general  form  Eq.5  is  analytically  intractable  due  to  strong  cor¬ 
relations  both  in  time  and  state-space.  Instead,  we  average  over  all  agents  as  in 
the  preceding  section  and  derive  the  macroscopic  equation  for  the  rate  of  change 
of  (Nn),  the  average  number  of  agents  in  state  n: 

^  =  E  [(W(n\n'))hNn,  -  ( W(n'\n))hNn ]  .  (7) 


Here  for  notational  convenience  denotes  average  over  histories,  and  we 

have  dropped  angle  brackets  around  N,  although  this  variable  still  denotes  an 
average  quantity. 

Equation  7  is  very  similar  to  the  rate  equation  we  used  to  study  Markov- 
based  agent  systems  Eq.  (3),  except  that  transition  rates  W(n\n’)  are  now  re¬ 
placed  by  their  history-averaged  values.  We  will  use  the  above  equation  to 
study  how  agents  can  use  histories,  or  memories  of  past  events,  to  improve  the 
collective  behavior  of  the  system.  We  illustrate  the  approach  in  Section  3.3  by 
examining  adaptive  task  allocation  in  robots. 


1.4  Modeling  Spatially  Correlated  Systems 

While  the  approach  outlined  above  works  well  for  many  spatially  uniform  sys¬ 
tems,  it  is  too  coarse-grained  for  systems  with  a  spatial  correlation  in  agents’ 
interactions.  Thus,  it  is  not  sufficient  to  describe,  for  example,  an  ant-like 
swarm  where  agents  interact  through  evolving  chemical  fields  or  robots  moni¬ 
toring  chemicals  released  into  a  fluid.  These  situations  require  a  generalization 
of  the  Master  Equation,  in  which  each  robot  not  only  has  a  discrete  controller 
state  k  but  also  a  continuous  coordinate  x  (i.e.,  its  spatial  location).  As  with 
the  original  formulation,  we  suppose  the  number  of  agents  in  each  state  is  suffi¬ 
cient  to  determine  the  collective  behavior  of  interest.  Because  x  is  a  continuous 
variable,  these  counts  become  densities  leading  us  to  introduce  hfc(x,t)  as  the 
average  robot  fraction  density  in  state  k  at  location  x  and  time  t.  Thus  a  small 
volume  Ax  around  location  x  contains,  on  average,  the  fraction  rifc(x,  t)Ax  of 
the  robots  in  the  system. 

Let  us  consider  a  system  where  agents  interact  with  the  environment  through 
a  certain  external  chemical  field.  Let  us  also  assume  that  agents  are  able  to 
interact  through  stigmergy  by  releasing  a  special  chemical  into  the  environment 
that  we  call  communicative  signal.  We  denote  p(x,  t)  and  c(x,  t)  concentration 
of  the  chemical  and  communicative  signal,  respectively,  at  point  x  at  time  t. 
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Then  we  can  write  down  the  generalized  rate  equation  as  follows: 

dnkfr'  ^  =  J  dx,'^wjk(-x.y-,p,c)nj(-x',t)  (8) 

3 

-  nk (x,  t)  J  dx  ^2  wkj  (x,  x';  p,  c) . 
j 

Now  the  transition  rates  Wjk  depend  not  only  the  state  indices  j  and  k  and 
occupation  vector  but  also  on  the  spatial  coordinates  and  concentration  of  the 
chemical  at  the  corresponding  points.  Note  also  that  we  have  included  the 
dependence  of  the  transition  rates  on  x  and  x'  explicitly  to  account  for  agents’ 
kinematics  even  in  the  absence  of  chemical  and  communicative  concentrations 
(e.g.,  to  describe  freely  diffusing  agent). 

The  transition  rates  tOjfc(x,  x';  p(x),  c(x))  summarize  the  behaviors  of  the 
individual  robots.  For  example,  the  robot’s  internal  state  could  change  when 
it  detects  a  chemical  concentration  above  a  predetermined  threshold.  Commu¬ 
nication  among  nearby  robots  allows  the  robots  to  reduce  noise  in  estimating 
chemical  gradients  and  hence  perform  better  than  individual  robots,  but  at  a 
cost  of  additional  power  use  for  the  communication.  Robot  motion,  either  mov¬ 
ing  passively  with  the  fluid  or  using  powered  locomotion,  e.g.,  to  follow  chemical 
concentration  gradients,  also  contributes  to  the  transitions. 

While  Equation  8  is  a  general  description  of  the  overall  system  behavior,  it 
is  too  complex  in  its  present  form  to  be  useful.  Fortunately,  it  can  be  simplified 
considerably  into  a  more  intuitive  form  by  noting  that  in  many  physically  real¬ 
istic  situations  agents’  motion  can  be  decoupled  from  state  transitions,  so  that 
the  transition  rate  can  be  represented  as 

Wjk  =  djkWk(x,  x';  p(x),  p(x'),  c(x),  c(x')) 

+  6(x  -  x')wjk(p(x),c(x)) ,  (9) 

where  5jk  is  Kroenecker’s  symbol2  and  <5(;r)  is  its  continuous  analogue  ^-function. 
In  other  words,  during  a  transition  between  two  discrete  states  we  neglect  the 
change  in  robot’s  position.  In  Eq.  9  Wk  is  an  appropriately  chosen  kernel  that 
describes  agents’  motion  (as  index  k  indicate,  it  can  be  different  for  each  state), 
while  the  second  term  describes  transition  between  discrete  states. 

Equation  9  allows  us  to  separate  transition  function  into  terms  with  purely 
spatial  transitions  and  terms  with  purely  state  transitions.  Indeed,  using  Eq.  9 
we  can  decouple  the  agents’  kinematics  from  the  state  transitions  between  dis¬ 
crete  state  and  rewrite  Eq.  8  as  follows: 

dUk&t'  ^  =  £kfik{x;t)  +  ^2™jk(p,c)nj(x,t)  (10) 

3 

-  nk(x,  t)  ^2  wkj{p,  c) 

3 

2Kroenecker’s  symbol  is  defined  as  follows:  5ij  =  1  if  i  =  j  and  Sij  =  0,  %  7^  j. 
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Here  Ck  is  an  operator  (specified  below)  that  describes  the  motion  of  agents  in 
state  k.  The  second  and  third  terms  in  Eq.  10  describe  agents  state  transitions. 
Note  that  now  Wjk(p,c)  depends  on  spatial  coordinates  indirectly,  through  con¬ 
centration  p(x,f)  and  c(x,f).  When  the  concentrations  p  and  c  are  constants, 
Eq.  3  is  recovered  by  integrating  Eq.  10  over  the  spatial  coordinates  x,  assuming 
that  Ck  preserves  the  number  of  agents  in  state  k  (e.g.,  no  absorbing  boundaries) 
so  that  integral  over  the  first  term  in  Eq.  10  vanishes. 

To  specify  the  operators  Ck,  we  note  that  for  the  particular  environment  we 
are  interested  in,  (i.e.,  microscopic  robots  operating  in  a  fluid)  robots’  motion 
can  be  described  by  a  diffusion  equation  [19].  We  studied  [11]  chemotactic  robots 
that  respond  to  a  chemical  and  signalling  fields  by  propelling  themselves  in  the 
direction  of  increasing  concentration.  This  capability  is  modeled  after  bacterial 
chemotaxis  which  allows  these  single  cell  organisms  to  efficiently  move  towards 
food  sources  and  away  from  noxious  sources.  Although  in  some  cases  the  exact 
derivation  from  the  microscopic  transition  rates  is  feasible,  if  very  involved  (see, 
for  example,  [36,  7]  for  treatment  of  bacterial  chemotaxis  which  can  be  treated 
as  a  biased  random  walk),  chemotaxis  in  a  chemical  concentration  field  p(x,  t)  is 
usually  introduced  into  the  rate  equations  phenomenologically  by  postulating  a 
chemotactic  velocity  as  Vo  =  r\ pVp(x,f),  where  r\p  is  the  so  called  chemotactic 
sensitivity  (which  may  itself  depend  on  p) .  One  can  then  write  for  operators  Ck 

Ck  =  DkV2  -  v  •  V  -  V  •  [V£(p,  Vp)  +  VcD(c,  Vc)]  (11) 

Here,  Dk  is  the  diffusion  coefficient  of  agents  in  state  k  assumed  to  be  a  constant, 
v  is  the  flow  velocity,  and  V^,  and  V^,  are  the  chemotaxis  drift  velocities  of 
robots  due  to  concentration  gradients  of  the  chemical  and  the  communicative 
signal,  respectively. 

To  proceed  further,  we  should  also  define  how  the  chemical  and  concentration 
fields  evolves  in  time.  As  an  example  relevant  for  microscopic  robots,  we  consider 
the  evolution  of  this  fields  in  a  moving  fluid  in  which  the  robots  operate.  The 
evolutions  of  p(x,  t)  c(x,  t )  are  governed  by  the  diffusion  equation: 

^  =  DpV2p  —  v  •  Vp  —  7pp  +  <3p(x,  f) 

dc 

—  =  DcV2c-  v-  Vc-7cc  +  ^<7fehfe(x,f) 

k 

In  Eq.  12  the  terms  on  the  right  describe,  respectively,  the  diffusion  of  the 
chemical  (with  a  diffusion  constant  Dp),  the  advection  of  the  chemical  due  to 
fluid  motion  with  velocity  v,  the  decay  of  the  chemical  at  rate  yp,  and  its  de¬ 
position  by  sources  with  intensity  profile  Qp(x,  t).  Terms  in  Eq.  13  have  similar 
meaning,  except  the  deposition  rates  of  signalling  chemical  is  proportional  to 
the  fraction  of  agents  in  state  k,  fik{x-,t)  (note  that,  generally  speaking,  the 
coefficients  qk  themselves  depend  on  the  fraction  of  agents  in  state  k).  The 
parameters  in  this  equation  could,  in  general,  depend  on  space  and  time,  as 
well  as  the  location  of  the  robots  (e.g.,  a  sufficiently  high  concentration  of  the 
robots  could  significantly  affect  the  fluid  flow).  For  simplicity,  we  will  treat 


(12) 

(13) 


them  as  constants.  For  microscopic  robots,  fluid  motions  will  usually  be  at 
very  low  Reynolds  number  so  the  fluid  flow  will  be  laminar  with  the  velocity 
v  changing  smoothly  with  location.  Viscous  forces  dominate  the  motions  of 
such  robots  with  requirements  for  locomotion  mechanisms  and  power  use  quite 
different  from  experiences  with  larger  robots  [39]. 

In  Section  3.4  we  motivate  the  approach  by  describing  a  medically  relevant 
scenario  that  considers  a  swarm  of  microscopic  robots  moving  in  a  fluid  to  local¬ 
ize  a  chemical  source.  We  solve  a  one-dimensional  model  and  analyze  different 
design  choices. 

1.5  Learning  Approaches  to  Distributed  Coordination 

The  problem  of  coordination  in  multi-agent  systems,  where  agents  have  to 
achieve  a  consensus  in  their  actions  to  receive  maximum  reward,  is  an  impor¬ 
tant  problem  that  has  attracted  much  interest.  Reinforcement  learning  and 
game  dynamics  have  been  shown  to  be  a  general  and  robust  method  for  achiev¬ 
ing  coordination  in  MAS,  even  when  agents  are  not  directly  communicating  or 
sharing  information  [42].  In  the  game  theory  formalism,  each  agent  is  charac¬ 
terized  by  a  set  of  strategies  and  it  seeks  to  maximize  its  payoff  (i.e.,  ,  utility  or 
profit).  Game  dynamics  studies  the  behavior  of  agents  in  response  to  games  that 
are  played  many  times  successively.  Over  the  course  of  the  games,  the  winning 
strategies  are  rewarded,  loosing  ones  are  penalized,  and  the  agents  maximize 
their  profit  or  utility  by  choosing  the  best  performing  strategies.  It  is  the  extra 
degree  of  freedom,  characterized  by  the  agents’  strategies,  that  allows  the  system 
to  adapt  in  dynamic  environments.  Game  dynamics  has  a  number  of  appealing 
properties  as  a  control  mechanism  for  multi-agent  systems:  it  is  distributed, 
flexible  and  scalable.  The  agents  may  vary  in  complexity  from  the  very  simple 
agents  who  do  not  have  any  information  about  other  players  or  rules  of  the 
game,  or  even  be  aware  of  their  existence,  to  more  complex  deliberative  agents 
who  can  strategize  and  reason  about  their  opponents  beliefs  and  actions.  The 
agents  may  act  independently  of  one  another,  or  jointly  as  in  some  cooperative 
agent  systems,  or  they  may  cooperate  or  act  competitively. 

The  Minority  Game  (MG)  was  introduced  as  a  simplification  of  Arthur’s  El 
Farol  Bar  attendance  problem  [2].  The  MG  consists  of  N  agents  with  bounded 
rationality  that  repeatedly  choose  between  two  alternatives  labelled  0  and  1 
(e.<7-,  staying  at  home  or  going  to  the  bar).  At  each  time  step,  agents  who 
made  the  minority  decision  win.  In  the  Generalized  Minority  Game,  the  wining 
group  is  1  (0)  if  the  fraction  of  the  agents  who  chose  “1”  is  smaller  (greater) 
than  the  capacity  level  77,  0  <  ij  <  1  (for  =  0.5,  the  game  reduces  to  the 
the  traditional  MG).  Each  agent  uses  a  set  of  S  strategies  to  decide  its  next 
move  and  reinforces  strategies  that  would  have  predicted  the  winning  group. 
A  strategy  is  simply  a  lookup  table  that  prescribes  a  binary  output  for  all 
possible  inputs.  In  the  original  version  of  the  game,  the  input  is  a  binary  string 
containing  the  last  m  outcomes  of  the  game,  so  the  agents  interact  by  sharing 
the  same  global  signal.  If  the  agents  choose  either  action  with  probability  1/2 
(the  random  choice  game),  then,  on  average,  the  number  of  agents  choosing 
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“1”  (henceforth  referred  to  as  attendance)  is  ( N  —  1) /2  with  standard  deviation 
a  =  VN /2  in  the  limit  of  large  N.  The  most  interesting  phenomenon  of  the 
minority  model  is  the  emergence  of  a  coordinated  phase,  where  the  standard 
deviation  of  attendance,  the  volatility,  becomes  smaller  than  in  the  random 
choice  game.  The  coordination  is  achieved  for  memory  sizes  for  which  the 
dimension  of  the  reduced  strategy  space  is  comparable  to  the  number  of  agents 
in  the  system  [4,  40],  2m  ~  N. 

In  addition  to  the  original  MG,  different  versions  of  the  game  where  the 
agents  interact  using  local  information  only  have  been  studied.  In  particular, 
it  was  established  that  coordination  still  arises  out  of  local  interactions,  and 
the  system  as  a  whole  achieves  “better  than  random”  performance  in  terms  of 
the  utilization  of  resources.  Note  that  reinforcement  learning  is  similar  to  game 
dynamics,  in  that  an  agent  receives  a  payoff  that  allows  it  to  determined  best 
actions.  Minority  Games  and  reinforcement  learning  in  general  can  serve  as 
a  general  paradigm  for  resource  allocation  and  load  balancing  in  multi-agent 
systems. 

In  all  previous  studies  the  capacity  level  has  been  fixed  as  an  external  pa¬ 
rameter,  so  the  environment  in  which  the  agents  compete  is  stationary.  In  many 
situations,  however,  agents  have  to  operate  in  dynamic  environments.  We  ad¬ 
dressed  this  problem  in  our  research.  Namely,  we  studied  a  system  of  boolean 
agents  playing  a  generalized  minority  game,  and  assumed  that  the  capacity  level 
is  not  fixed  but  varies  with  time,  rj(t)  =  ij o  +  where  771(f)  is  a  time  depen¬ 

dent  perturbation.  The  framework  of  the  interactions  was  based  on  Kauffman 
NK  random  boolean  nets  [20] ,  where  each  agent  gets  its  input  from  K  other  ran¬ 
domly  chosen  agents,  and  maps  the  input  to  a  new  state  according  to  a  boolean 
function  of  K  variables,  which  is  also  randomly  chosen  and  quenched  through¬ 
out  the  dynamics  of  the  system.  The  generalization  we  made  is  that  agents  are 
allowed  to  adapt  by  having  more  than  one  boolean  function,  or  strategy,  and 
the  use  of  a  particular  strategy  is  determined  by  an  agent  based  on  how  often 
it  predicted  the  winning  group  throughout  the  game. 


2  SUMMARY  OF  PROJECT  RESULTS 

We  have  achieved  great  success  in  applying  mathematical  formalism  outlined 
above  to  study  collective  behavior  of  distributed  systems  of  mobile  robots  for 
which  a  body  of  experimental  and  simulations  data  exists.  In  this  section  we 
outline  some  of  our  successes,  going  into  detail  of  the  particular  applications  in 
the  later  sections. 

2.1  Collective  Behavior  of  Groups  of  Robots 

We  mathematically  studied  collective  behavior  of  various  distributed  robot  sys¬ 
tems.  These  studies  were  inspired  and  corroborated  by  experiments  and  simu¬ 
lations  with  real  robots.  For  example,  Ijspeert  et  al.  [16]  studied  dynamics  of 
collaboration  in  groups  of  robots  using  stick-pulling  experiments  as  a  model  of 
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collaboration.  The  robots’  task  was  to  locate  sticks  scattered  around  the  arena 
and  pull  them  out  of  their  holes.  A  single  robot  cannot  complete  the  task  on  its 
own:  rather,  when  a  robot  finds  a  stick,  it  lifts  it  partially  out  of  the  hole  and 
waits  for  a  period  specified  by  its  gripping  time  parameter  for  a  second  robot 
to  find  it.  If  a  second  robot  finds  the  first  during  this  time  interval,  it  will  pull 
the  stick  out;  otherwise,  the  first  robot  releases  the  stick  and  returns  to  the 
searching  state. 

We  found  that  a  minimal  model  that  includes  only  the  salient  details  of  the 
process  [27]  reproduced  key  experimental  observations  and  qualitatively  agreed 
with  results  of  experiments  and  simulations  (see  Figure  2(a)).  Martinoli  & 
Easton  [30]  formulated  a  more  detailed  model  based  on  our  work  that  accounts 
for  every  state  in  the  robot  control  diagram. 


(a) 


(b) 


Figure  2:  Collaboration  rate  per  robot  vs  gripping  time  parameter  for  differ¬ 
ent  robot  group  sizes  and  16  sticks,  (a)  Results  of  the  minimal  model  for  8 
(short  dash),  16  (long  dash)  and  24  (solid  line)  robots,  (b)  Results  for  detailed 
model  (solid  lines),  embodied  simulations  (dotted  lines),  the  microscopic  model 
(dashed  lines). 


Figure  2  depicts  the  collaboration  rate,  the  rate  at  which  robots  pull  sticks 
out,  as  a  function  of  the  individual  robot  gripping  time  parameter  for  the  mini¬ 
mal  (a)  and  the  detailed  (b)  models.  Figure  2(b)  also  shows  results  of  embodied 
and  probabilistic  numeric  simulations  for  the  same  set  of  parameters.  One  can 
see  quantitative  agreement  already  with  swarms  as  small  as  8  robots.  The  min¬ 
imal  model  shows  the  same  qualitative  behavior  as  the  more  detailed  model. 
See  Section  3.1  for  details  of  the  application. 

In  foraging  experiments,  we  studied  the  influence  of  physical  interference  on 
the  swarm  performance  [23].  Interference  is  a  critical  issue  in  swarm  robotics, 
in  particular  in  foraging  experiments  where  there  is  a  spatial  bottleneck  at 
the  predefined  “home”  region  where  the  collected  objects  must  be  delivered. 
When  two  robots  find  themselves  within  sensing  distance  of  one  another,  they 
will  execute  obstacle  avoidance  maneuvers.  Because  this  behavior  takes  time, 
interference  decreases  robots’  efficiency.  Clearly,  a  single  robot  working  alone  is 
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relatively  more  efficient,  because  it  does  not  experience  interference  from  other 
robots  (the  larger  the  swarm,  the  greater  the  degree  of  interference).  However, 
parallel  work  helps  speed  up  the  foraging  process  and  increases  the  system 
robustness  in  case  of  individual  robot  failures. 
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Figure  3:  Time  it  takes  the  swarm  of  robots  to  collect  objects  in  the  arena  for  two 
difference  interference  strengths.  Symbols  are  results  of  embodied  simulations, 
while  lines  give  the  model’s  predictions. 

Figure  3  shows  the  total  time  required  to  complete  the  task  for  two  differ¬ 
ent  interference  strengths,  as  measured  by  the  avoiding  time  r.  For  both  cases 
task  completion  time  is  minimized  for  some  swarm  size  and  increases  for  larger 
swarms.  The  greater  the  effect  of  interference  (larger  r),  the  smaller  the  op¬ 
timal  swarm  size.  Results  show  good  quantitative  agreement  with  embodied 
simulations  with  swarms  of  one  to  20  robots.  Section  3.2  presents  details  of  the 
application. 

We  studied  extensions  of  the  basic  model  outlined  in  Section  1.2.  In  Sec¬ 
tion  3.3  we  analyze  dynamic  task  allocation  in  multi-robot  systems.  In  this  ap¬ 
plication,  robots  adapt  to  changing  task  requirements  and  environmental  condi¬ 
tions  by  making  repeated  local  observations  of  the  tasks,  environment  and  other 
robots.  Such  robots  can  be  described  as  general  Markov  processes  and  studied 
using  the  formalism  of  Section  1.3.  We  obtained  very  good  agreement  between 
predictions  of  the  model  and  results  of  realistic  3-D  simulations. 

Another  refinement  of  the  formalism,  as  described  in  Section  1.4,  applies  to 
spatially  non-uniform  systems,  for  example,  systems  where  robots  generate  and 
interact  with  diffusing  chemical  fields. 

2.2  Distributed  Resource  Allocation 

The  problem  of  coordination  in  multi-agent  systems,  where  agents  have  to 
achieve  a  consensus  in  their  actions  to  receive  maximum  reward,  is  an  impor¬ 
tant  problem  that  has  attracted  much  interest  recently  [41].  We  studied  minority 
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games  and  reinforcement  learning  as  a  model  for  resource  allocation/load  bal¬ 
ancing  problem  in  a  large  scale  MAS,  where  resource  capacities  are  changing 
in  time.  We  found  that  reinforcement  learning  [9]  and  minority  games [13,  12] 
were  efficient  and  robust  mechanisms  for  achieving  coordination  in  dynamic 
distributed  systems.  We  applied  this  mechanism  for  load  balancing  in  Grid 
distributed  computing  environment  [10].  Section  4  presents  details  of  this  ap¬ 
plication. 

2.3  Collective  Mind  Project 

In  addition  to  this  work,  Business  Collective  Mind  for  Equipment  Reliability 
project  was  funded  by  DARPA  at  the  level  of  a  study.  The  Principal  Investigator 
was  Norman  Sondheimer  of  University  of  Massachusetts  with  William  Wallace 
of  Rensselaer  Polytechnic  Institute  and  Peter  Will  of  University  of  Southern  Cal¬ 
ifornia  Information  Sciences  Institute  as  co-PIs.  The  tasks  were  to  solicit  ideas 
from  the  best  University,  Industry  and  Military  researchers  and  practitioners  on 
the  Collective  Mind  concept  to  generate  support  for  a  research  program  from 
the  Military  and  report  the  results  to  DARPA.  This  project  is  described  in 
Section  5. 


3  Robotic  Applications 

In  the  sections  below  we  illustrate  our  approach  to  modeling  and  analyzing 
collective  behavior  of  multi-agent  systems  with  detailed  applications  from  the 
robotics  domain. 

3.1  Collaboration  in  a  Group  of  Robots 

The  stick-pulling  experiments  were  carried  out  by  Ijspeert  et  at  [16]  to  study 
the  dynamics  of  collaboration  among  locally  interacting  simple  reactive  robots. 
Figure  4  is  a  snapshot  of  the  physical  set-up  of  the  experiments.  The  robots’  task 
is  to  locate  sticks  scattered  around  the  arena  and  pull  them  out  of  their  holes. 
A  single  robot  cannot  pull  the  stick  out  by  itself  —  a  collaboration  between 
two  robots  is  required  for  the  task  to  be  successfully  completed.  Collaboration 
occurs  in  the  following  way:  one  robot  finds  a  stick,  lifts  it  partly  out  of  the 
ground  and  waits  for  a  second  robot  to  find  it  and  complete  the  task  by  pulling 
the  stick  out  of  its  hole  completely. 

The  actions  of  each  robot  are  governed  by  the  same  simple  controller,  out¬ 
lined  in  Figure  5.  The  robot’s  default  behavior  is  to  wander  around  the  arena 
looking  for  sticks  and  avoiding  obstacles,  which  could  be  other  robots  or  walls. 
When  a  robot  finds  a  stick  that  is  not  being  held  by  another  robot,  it  grips 
it,  lifts  it  half  way  out  of  the  ground  and  waits  for  a  period  of  time  specified 
by  the  gripping  time  parameter.  If  no  other  robot  comes  to  its  aid  during  the 
waiting  period  (time  out),  the  robot  releases  the  stick  and  resumes  the  search 
for  other  sticks.  If  another  robot  encounters  a  robot  holding  a  stick,  a  successful 
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Figure  4:  Physical  set-up  of  the  stick-pulling  experiment  showing  six  Khepera 
robots. 

collaboration  will  take  place  during  which  the  second  robot  will  grip  the  stick, 
pulling  it  out  of  the  ground  completely,  while  the  first  robot  releases  the  stick 
and  resumes  the  search.  After  the  task  is  completed,  the  second  robot  also  re¬ 
leases  the  stick  and  returns  to  the  search  mode,  and  the  experimenter  replaces 
the  stick  in  its  hole. 

3.1.1  Real  Robots,  Embodied  Simulations  and  Microscopic  Model¬ 
ing 

Ijspeert  et  al.  studied  the  dynamics  of  collaboration  in  the  stick-pulling  experi¬ 
ment  at  three  different  levels:  by  conducting  experiments  with  physical  robots; 
using  a  sensor-based  simulator  of  robots;  and  using  a  microscopic  probabilis¬ 
tic  model.  The  physical  experiments  were  carried  out  in  groups  of  two  to  six 
Khepera  robots  in  an  arena  containing  four  sticks.  Because  experiments  with 
physical  robots  are  very  time  consuming,  Webots,  the  sensor-based  simulator 
of  Khepera  robots,  was  used  to  systematically  explore  parameters  affecting  the 
dynamics  of  collaboration.  The  Webots  simulator  [32]  attempts  to  faithfully 
model  the  environment  and  replicate  the  experiment  by  reproducing  the  robots’ 
(noisy)  sensory  input  and  the  (noisy)  response  of  the  on-board  actuators  in  or¬ 
der  to  compute  the  trajectory  and  interactions  of  all  the  robots  in  the  arena. 
The  probabilistic  microscopic  model,  on  the  other  hand,  does  not  attempt  to 
compute  trajectories  of  individual  robots.  Rather,  it  is  a  numerical  model  in 
which  the  robot’s  actions  —  encountering  a  stick,  a  wall,  another  robot,  a  robot 
gripping  a  stick,  or  wandering  around  the  arena  —  are  represented  as  a  series  of 
stochastic  events,  with  probabilities  based  on  simple  geometric  considerations 
and  systematic  tests  with  one  or  two  real  robots.  For  example,  the  probability 
of  a  robot  encountering  a  stick  is  equal  to  the  product  of  the  number  of  un¬ 
gripped  sticks,  and  the  detection  area  of  the  stick  normalized  by  the  arena  area. 
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Probabilities  of  other  interactions  can  be  similarly  calculated.  The  microscopic 
simulation  consists  of  running  several  processes  in  parallel,  one  for  each  robot, 
while  keeping  track  of  the  global  state  of  the  environment,  such  as  the  number 
of  gripped  and  ungripped  sticks.  According  to  Ijspeert  et  al.  the  acceleration 
factor  for  Webots  and  real  robots  can  vary  between  one  and  two  orders  of  mag¬ 
nitude  for  the  experiments  presented  here.  Because  the  probabilistic  model  does 
not  require  calculations  of  the  details  of  the  robots’  trajectories,  it  is  about  300 
times  faster  than  Webots  for  this  experiment. 


Figure  5:  Flowchart  of  the  robots’  controller  reported  from  [16]  with  overlapped 
state  blocks. 

Ijspeert  et  al.  systematically  studied  the  collaboration  rate,  i.  e.,  the  number 
of  sticks  successfully  pulled  out  of  the  ground  in  a  given  time  interval,  and  its 
dependence  on  the  group  size  and  the  gripping  time  parameter.  Though  in 
that  work  they  also  investigated  the  effects  of  robot  heterogeneity  and  explicit 
communication,  we  will  focus  on  a  homogeneous  system  of  non-communicating 
robots.  Ijspeert  et  al.  report  very  good  qualitative  and  quantitative  agreement 
between  the  three  different  levels  of  experiments.  The  main  result  is  that, 
depending  on  the  ratio  of  robots  to  sticks  (or  workers  to  the  amount  of  work), 
there  appear  to  be  two  different  regimes  in  the  collaboration  dynamics.  When 
there  are  fewer  robots  than  sticks,  the  collaboration  rate  decreases  to  zero  as 
the  value  of  the  gripping  time  parameter  grows.  In  the  extreme  case,  when  the 
robot  grabs  a  stick  and  waits  indefinitely  for  another  robot  to  come  and  help  it, 
the  collaboration  rate  is  zero,  because  after  some  period  of  time  each  robot  ends 
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up  holding  a  stick,  and  no  robots  are  available  to  help.  When  there  are  more 
robots  than  sticks,  the  collaboration  rate  remains  finite  even  in  the  limit  the 
gripping  time  parameter  becomes  infinite,  because  there  will  always  be  robots 
available  to  help  pull  the  sticks  out.  Another  finding  of  Ijspeert  et  al.  was  that 
when  there  are  fewer  robots  than  sticks,  there  is  an  optimal  value  of  the  gripping 
time  parameter  which  maximizes  the  collaboration  rate.  In  the  other  regime, 
the  collaboration  rate  appears  to  be  independent  of  the  gripping  time  parameter 
above  a  specific  value,  so  the  optimal  strategy  is  for  the  robot  to  grip  a  stick  and 
hold  it  indefinitely.  They  also  found  that  the  system  is  one  of  few  collaborative 
systems  known  to  the  authors  that  demonstrates  super-linearity,  i.  e.,  for  some 
range  of  robot  group  sizes  and  a  given  number  of  sticks,  adding  a  robot  not  only 
increases  the  global  performance  of  the  system  but  also  the  relative  performance 
of  the  other  robots.  However,  as  the  robot  group  size  increases,  the  overcrowding 
and  interference  effects  cause  the  relative  collaboration  rate  to  saturate  and 
become  sub- linear. 

3.1.2  Mathematical  Model  of  the  Stick-Pulling  Experiments 

In  the  following  sections  we  present  a  macroscopic  analytical  model  of  the  stick¬ 
pulling  experiments  in  a  homogeneous  multi-robot  system.  Such  a  model  is 
useful  for  the  following  reasons.  First,  the  complexity  of  a  macroscopic  model 
is  independent  of  the  system  size,  i.e.,  the  number  of  robots:  therefore,  the 
time  required  to  obtain  solutions  for  a  system  of  5, 000  robots  is  as  long  as 
that  to  obtain  solutions  for  a  system  of  five  robots,  whereas  for  a  microscopic 
description  the  time  required  for  computer  simulation  scales  at  least  linearly 
with  the  number  of  robots.  Second,  our  approach  allows  us  to  derive  analytic 
expressions  for  certain  important  parameters,  ( e.g .,  those  for  which  the  per¬ 
formance  is  optimal).  It  also  enables  us  to  study  the  stability  properties  of 
the  system,  and  see  whether  solutions  are  robust  under  external  perturbation 
or  noise.  These  capabilities  are  important  for  the  design  and  control  of  large 
multi-agent  systems. 

In  order  to  construct  a  model  of  the  stick-pulling  experiments,  it  is  helpful  to 
write  the  macroscopic  state  diagram  of  the  system.  During  a  sufficiently  short 
time  interval,  each  robot  can  be  thought  to  be  in  one  of  two  states:  searching 
or  gripping.  The  state  labels  several  related  robot  behaviors  and  it  is  a  useful 
shorthand  for  thinking  about  the  system.  Using  flowchart  of  the  robots’  con¬ 
troller,  shown  in  Fig.  5,  as  a  reference,  we  can  consider  the  search  state  to  be  the 
set  of  behaviors  associated  with  looking  for  sticks,  such  as  wandering  around  the 
arena  (“look  for  sticks”  action),  detecting  objects  and  avoiding  obstacles;  while 
the  gripping  state  is  composed  of  the  decisions  and  actions  inside  the  dotted  box. 
We  assume  that  actions  “success”  (pull  the  stick  out  completely)  and  “release” 
(release  the  stick)  take  place  on  a  short  enough  time  scale  that  they  can  be 
incorporated  into  the  search  state.  While  the  robot  is  in  the  obstacle  avoidance 
mode,  it  cannot  detect  and  try  to  grip  objects;  therefore,  avoidance  serves  to 
decrease  the  number  of  robots  that  are  searching  and  capable  of  gripping  sticks. 
We  can  also  include  avoidance  into  the  model  explicitly  [27]. 
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In  addition  to  states,  we  must  also  specify  all  possible  transitions  between 
states.  When  it  finds  a  stick,  the  robot  makes  a  transition  from  the  search  state 
to  the  gripping  state.  After  both  a  successful  collaboration  and  when  it  times  out 
(unsuccessful  collaboration)  the  robot  releases  the  stick  and  makes  a  transition 
into  the  searching  state,  as  shown  in  Fig.  6.  These  arrows  correspond  to  the 
arrow  entering  and  the  two  arrows  leaving  the  dotted  box  in  Fig.  5.  We  will 
use  the  macroscopic  state  diagram  as  the  basis  for  writing  down  the  differential 
rate  equations  that  describe  the  dynamics  of  the  stick-pulling  experiments. 


(s) 
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Figure  6:  Macroscopic  state  diagram  of  the  multi-robot  system.  The  arrow 
marked  ‘s’  corresponds  to  the  transition  from  the  gripping  to  the  searching 
state  after  a  successful  collaboration,  while  the  arrow  marked  ‘u’  corresponds 
to  the  transition  after  an  unsuccessful  collaboration,  i.e.,  when  the  robots  time 
out. 

The  dynamic  variables  of  the  model  are  Ns(t)  and  Ng(t ),  the  number  of 
robots  in  the  searching  and  gripping  states  respectively.  Also,  let  M(t )  be  the 
number  of  unextracted  sticks  at  time  t.  The  latter  variable  does  not  represent 
a  macroscopic  state,  rather  it  tracks  the  state  of  the  environment.  We  assume 
that  robots  and  sticks  are  distributed  uniformly  around  the  arena. 

A  series  of  differential  rate  equations  govern  the  dynamics  of  the  stick-pulling 
system: 

-aNs(t)  ^M(t)  -  Ng(t)j  +  aNs(t)Ng(t) 

+aNs(t  -  r)^M(t  -  t)  -  Ng(t  -  r)^r(t;r)  (14) 

N0  -  Ns  (15) 

—aNs(t)Ng(t)  +  n(t)  (16) 

where  a,  a  are  the  rates  at  which  a  searching  robot  encounters  a  stick  and  a 
gripping  robot  respectively,  r  is  the  gripping  time  parameter,  and  p,(f)  is  the  rate 
at  which  new  tasks  are  added.  The  parameters  a,  d,  and  r  connect  the  model 
to  the  experiment,  a  and  a  are  related  to  the  size  of  the  object,  the  robot’s 
detection  radius,  or  footprint,  and  the  speed  at  which  it  explores  the  arena. 
The  three  terms  in  Eq.  14  correspond  to  the  three  arrows  in  Fig.  6.  The  first 
term  accounts  for  the  decrease  in  the  number  of  searching  robots  because  some 
robots  find  and  grip  sticks.  Under  the  uniform  distribution  assumption,  the 
rate  at  which  robots  encounter  ungripped  sticks  is  proportional  to  the  number 


dNs 

dt 


Ng  = 

dM 

dt 
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of  ungrippecl  sticks  in  the  arena,  with  the  proportionality  factor  given  by  a. 
The  second  term  describes  the  successful  collaborations  between  two  robots, 
and  the  third  term  accounts  for  the  failed  collaborations,  both  of  which  lead  to 
an  increase  the  number  of  searching  robots. 

T(t;r),  the  fraction  of  failed  collaborations  at  time  t,  is  the  probability  no 
robot  came  “to  help”  during  the  time  interval  [t  —  T,t],  To  calculate  T(t;  r)  let  us 
divide  the  time  interval  [t  —  t,  t]  into  K  small  intervals  of  length  6t  =  t/K.  The 
probability  that  no  robot  comes  to  help  during  the  time  interval  [t  —  r,  t  —  r  +  St] 
is  simply  1  —  aNs(t  —  r)5t.  Hence,  the  probability  for  a  failed  collaboration  is 


T(f;  r) 


K 

JJ[1  —  aStNs(t  —  r  +  i5t)]<d(t  —  r) 

i=l 


exp 


K 

E 


ln[l  —  aSt.Ns(t  —  t  +  i5t)\ 


0(t-r) 
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The  step  function  0(f  —  r)  ensures  that  T(f;  r)  is  zero  for  t  <  r.  Finally, 
expanding  the  logarithm  in  Eq.(18)  and  taking  the  limit  St  — >  0  we  obtain 

T(t;  r)  =  exp[—  a  f  d£ Ns(t')]Q(t  —  r)  (18) 

J  t  —  T 


We  do  not  need  a  differential  equation  for  Ng,  the  number  of  gripping  robots, 
because  this  quantity  may  be  computed  using  conservation  of  robots  condition, 
Eq.  15.  The  last  equation,  Eq.  16,  says  that  the  number  of  unextracted  sticks 
M(t)  decreases  in  time  at  the  rate  of  successful  collaborations.  The  equations 
are  subject  to  the  initial  conditions  that  at  t  =  0  the  number  of  searching  robots 
is  Nq  and  the  number  of  unextracted  sticks  is  Mq. 


Dimensional  Analysis  To  proceed  further  let  us  introduce  n(t)  =  Ns(t)/No, 
m(t)  =  M(t)/Mo,  (3  =  N0/M0,  Rq  =  a/ a,  (3  =  Rg/3  and  a  dimensionless  time 
t  — >  aMot,  t  — >  cxMqt.  [>!  is  the  dimensionless  rate  at  which  new  tasks  (sticks) 
are  added.  n(t)  is  the  fraction  of  robots  in  the  search  state  and  m(t)  is  the 
fraction  of  unextracted  sticks  at  time  t.  Due  to  the  conservation  of  the  number 
of  robots,  the  fraction  of  robots  in  the  gripping  state  is  simply  1  —  n(t).  The 
equations  Eq.  14-  16  can  be  rewritten  in  dimensionless  form  as: 
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dt 

dm 

dt 

7  (i;r) 


— n(t)[m(t)  +  f3n(t)  —  (3\+  f3n(t)[  1  —  n(t)]  +  n(t  —  r)[m(t  —  r) 
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— (3j3n(t)[l  —  n(t)]  +  p! 

(20) 

exp[— f3  f  dt'n{t')\ 

J  t  —  T 

(21) 

Equations  19-21  together  with  initial  conditions  n(0)  =  1,  m(0)  =  1  deter¬ 
mine  the  dynamical  evolution  of  the  system.  Note  that  only  two  parameters,  f3 
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and  r,  appear  in  the  equations  and,  thus,  determine  the  behavior  of  solutions. 
The  third  parameter  f3  =  Rg(3  is  fixed  experimentally  and  is  not  independent. 
Note  that  we  do  not  need  to  specify  a  and  a  —  they  enter  the  model  only 
through  Rg  (throughout  this  paper  we  will  use  Rq  =  0.35,  the  value  reported 
in  [16]). 3  Below  we  provide  a  detailed  analysis  of  these  equations. 

Analysis  of  Results  Let  is  assume  that  new  sticks  are  added  to  the  system 
at  the  same  rate  that  the  robots  pull  them  out.  This  situation  was  realized 
experimentally  by  replacing  the  sticks  in  their  holes  after  they  were  pulled  out 
by  robots.  Therefore,  the  number  of  sticks  does  not  change  with  time  (m(i)  = 
m(0)  =  1).  A  steady-state  solution,  if  it  exists,  describes  the  long  term  time- 
independent  behavior  of  the  system.  To  find  it,  we  set  the  left  hand  side  of 
Eq.  19  to  zero.  Eq.  19  has  a  non-trivial  steady-state  solution  which  satisfies  the 
following  transcendental  equation: 

-1  +  (/?  +  /3)(1  -  n)  +  (1  -  (3{l  -  n))e~PTn  =  0  (22) 

Figure  7  shows  the  dependence  of  the  fraction  of  searching  robots  in  the  steady 
state  on  the  gripping  time  r  for  different  values  of  the  parameter  (3.  Note,  that 
for  small  enough  /3’s  n(r)  — >  0  as  r  — *  oo.  The  intuitive  reason  for  this  is  the 
following:  when  there  are  fewer  robots  than  sticks,  and  each  robot  holds  the 
stick  indefinitely,  after  a  while  every  robot  is  holding  a  stick,  and  no  robots  are 
searching.  For  (3  >  1/(1  +  Rg),  however,  n(r)  — >  const  /  0  as  t  -+  oo.  The 
inset  in  Fig.  7  shows  how  a  typical  solution,  n(t),  relaxes  to  its  steady  state 
value.  The  oscillations  are  characteristic  of  time-delay  differential  equations, 
and  their  period  is  determined  by  r. 

The  collaboration  rate  is  the  rate  at  which  robots  successfully  pull  sticks 
out  of  their  holes.  The  steady-state  collaboration  rate  R(t;  (3)  is  given  by  the 
following  equation: 

R(t,0)  =  (3(3n(T,/3)[l  -n(r,/3)] ,  (23) 

where  n(r,  (3)  is  the  number  of  searching  robots  in  the  steady-state  for  a  par¬ 
ticular  value  of  r  and  /?,  and  (1  —  n(r,  /?))  is  the  number  of  gripping  robots  in 
the  steady-state.  Figure  8  depicts  the  collaboration  rate  as  a  function  of  r. 
For  (3  >  (3 c  the  collaboration  rate  increases  monotonically  with  r.  However, 
for  {3  <  (3 c  there  is  an  optimal  gripping  time,  r  =  Topt,  which  maximizes  the 
collaboration  rate.  To  understand  this  behavior  note  that  the  maximum  collab¬ 
oration  rate  for  a  given  [3  is  achieved  for  n(r,  (3)  =  1/2.  For  f3  >  /?c,  however, 

3The  parameter  a  can  be  easily  calculated  from  experimental  values  quoted  in  [16].  As 
a  robot  travels  through  the  arena,  it  sweeps  out  some  area  during  time  dt  and  will  detect 
objects  that  fall  in  that  area.  This  detection  area  is  VjiWRdt ,  where  Vr  =  8.0  cm/s  is  robot’s 
speed,  and  Wr  =  14.0  cm  is  robot’s  detection  width.  If  the  arena  radius  is  R  =  40.0  cm,  a 
robot  will  detect  sticks  at  the  rate  a  =  VrWr/tcR2  =  0.02  s—1.  According  to  [16],  a  robot’s 
probability  to  grab  a  stick  already  being  held  by  another  robot  is  35%  of  the  probability  of 
grabbing  a  free  stick.  Therefore,  Rq  =  ol/ol  —  0.35.  Rq  is  an  experimental  value  obtained 
with  systematic  experiments  with  two  real  robots,  one  holding  the  stick  and  the  other  one 
approaching  the  stick  from  different  angles. 
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Figure  7:  Steady  state  solution  vs  (dimensionless)  gripping  time  parameter  r: 
for  (3  =  0.5  (short  dash),  1  (long  dash),  1.5  (solid  line).  Inset  shows  a  typical 
relaxation  to  the  steady  state  for  r  =  5,  (3  =  0.5. 
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Figure  8:  Collaboration  rate  per  robot  vs  (dimensionless)  gripping  time  pa¬ 
rameter  r  for  (3  =  0.5  (short  dash),  (3=1  (long  dash),  (3  =  1.5  (solid  line). 
These  values  of  (3  correspond,  respectively,  to  two,  four,  and  six  robots  in  the 
experiments  with  four  sticks. 
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the  solution  of  Eq.  22  is  always  greater  than  1/2,  so  an  optimal  solution  does 
not  exist.  For  (3  <  (3C  a  simple  analysis  gives 


2,  1  -  0/2 

Tovt  =  -  hi - — . 

P  (3  1  -  1/20 3  +  P)' 


f3  <  f3c  = 


1  +  Rg 


(24) 


Mathematical  analysis  of  the  minimal  model  reproduces  the  following  con¬ 
clusions  of  Ijspeert  et  air.  the  different  dynamical  regimes  depending  on  the 
value  of  the  ratio  of  robots  to  sticks  (/3)  and  the  optimal  gripping  time  param¬ 
eter  for  (3  <  (3C.  The  three  curves  in  Fig.  8  are  qualitatively  similar  results  of 
simulations  in  groups  of  up  to  six  robots.  Martinoli  &  Easton  [30]  formulated 
a  more  detailed  model  based  on  our  work  that  accounts  for  every  state  in  the 
robot  control  diagram  and  agrees  quantitatively  with  simulations  of  groups  of 
as  few  as  a  dozen  robots. 


3.2  Optimal  Group  Size  for  Robot  Foraging 

Figure  9  is  a  snapshot  of  a  typical  foraging  experiment  with  four  robots.  The 
robots’  task  is  to  collect  small  pucks  scattered  randomly  around  the  arena. 
The  arena  itself  is  divided  into  a  search  region  and  a  small  “home”,  or  goal, 
region  where  the  collected  pucks  are  deposited.  The  “boundary”  and  “buffer” 
regions  are  part  of  the  home  region  and  are  made  necessary  by  limitations  in 
the  robots’  sensing  capabilities,  as  described  below.  Each  robot  has  an  identical 
set  of  behaviors  governed  by  the  same  controller.  The  behaviors  that  arise  in 
the  collection  task  are  [15]: 

Avoiding  obstacles,  including  other  robots  and  boundaries.  This  behavior  is 
critical  to  the  safety  of  the  robot. 

Wandering  or  searching  for  pucks:  robot  moves  forward  and  at  random  in¬ 
tervals  turns  left  or  right  through  a  random  arc.  If  the  robot  enters  the 
Boundary  region,  it  returns  to  the  search  region.  This  prevents  the  robot 
from  collecting  pucks  that  have  already  been  delivered. 

Detecting  a  puck. 

Grabbing  a  puck. 

Homing  :  if  carrying  a  puck,  move  towards  the  home  location. 

Creeping  :  activated  by  entering  Buffer  region.  The  robot  will  start  using  the 
close-range  detectors  at  this  point  to  avoid  the  boundaries. 

Home  :  robot  drops  the  puck.  This  activates  the  exiting  behavior. 

Exiting  :  robot  exits  the  home  region  and  resumes  search. 
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Figure  9:  Diagram  of  the  foraging  arena  (courtesy  of  D.  Goldberg). 


3.2.1  Interference 

In  the  foraging  scenario  outlined  above,  robots  act  completely  independently, 
without  communicating  directly  or  through  the  environment.  Interference  is  the 
only  interaction  between  the  robots,  and  it  is  caused  by  competition  for  space 
between  spatially  extended  robots.  When  two  robots  find  themselves  within 
sensing  distance  of  one  another,  they  will  execute  obstacle  avoiding  maneuvers 
in  order  to  reduce  the  risk  of  a  potentially  damaging  collision.  The  robot  stops, 
turns  in  place  by  some  angle  and  moves  forward.  This  behavior  takes  time  to 
execute;  therefore,  avoidance  increases  the  time  it  takes  the  robot  to  find  pucks 
and  deliver  them  home.  Clearly,  a  single  robot  working  alone  will  not  experience 
interference  from  other  robots.  However,  if  a  single  robot  fails,  as  is  likely  in 
a  dynamic,  hostile  environment,  the  collection  task  will  not  be  completed.  A 
group  of  robots,  on  the  other  hand,  is  robust  to  an  individual’s  failure.  Indeed, 
many  robots  may  fail  but  the  performance  of  the  group  may  be  only  moderately 
affected.  Many  robots  working  in  parallel  may  also  speed  up  the  collection  task. 
Of  course,  the  larger  the  group,  the  greater  the  degree  of  interference  —  in  the 
extreme  case  of  a  crowded  arena,  robots  will  spend  all  their  time  avoiding  other 
robots  and  will  not  bring  any  pucks  home. 

Interference  has  long  been  recognized  as  a  critical  issue  in  multi-robot  sys¬ 
tems  [8,  43].  Several  approaches  to  minimize  interference  have  been  explored, 
including  communication  [37]  and  cooperative  strategies  such  as  trail  formation 
[46]  and  bucket  brigade  [8,  35].  In  some  cases,  the  effectiveness  of  the  strategy 
to  minimize  interference  will  also  depend  on  the  group  size  [35].  Therefore,  it 
is  important  to  quantitatively  understand  interference  between  robots  and  how 
it  relates  to  the  group  and  task  sizes  before  choosing  alternatives  to  the  default 
strategy.  For  some  tasks  and  a  given  controller,  there  may  exist  an  optimal  group 
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size  that  maximizes  the  performance  of  the  system  [34,  8,  35] .  Beyond  this  size 
the  adverse  effects  of  interference  become  more  important  than  the  benefits  of 
increased  robustness  and  parallelism,  and  it  may  become  beneficial  to  choose 
an  alternate  foraging  strategy.  We  will  study  interference  mathematically  and 
attempt  to  answer  these  questions. 

3.2.2  Mathematical  Analysis  of  Foraging 

As  mentioned  above,  interference  is  the  result  of  competition  between  two  or 
more  robots  for  the  same  resource,  be  it  physical  space,  a  puck  both  are  trying 
to  pick  up,  energy,  communications  channel,  etc.  In  the  collection  and  foraging 
tasks,  competition  for  physical  space,  and  the  resulting  avoidance  of  collisions 
with  other  robots,  is  the  most  common  source  of  interference.  In  order  to  under¬ 
stand  interference  quantitatively,  we  will  first  examine  the  simplified  foraging 
task  that  includes  searching  and  avoiding  only.  This  task  can  be  implemented 
with  a  subset  of  robot  behaviors  listed  in  Section  3.2,  namely  searching,  avoid¬ 
ing,  detecting  a  puck  and  grabbing  it.  This  scenario  may  be  realized  experi¬ 
mentally  by  allowing  robots  to  pick  up  a  puck  and  store  it  in  a  carrying  pouch, 
for  instance.  Then  we  will  examine  the  full  foraging  scenario,  where  robots  are 
required  to  deliver  collected  pucks  to  a  home  location. 

Above  we  described  a  methodology  for  constructing  mathematical  models  of 
collective  behavior  of  multi-agent  systems.  The  methodology  applies  to  Markov 
systems,  in  which  each  agent’s  state  at  a  future  time  depends  only  on  its  present 
state  and  none  of  its  past  states.  While  this  may  seem  as  a  restrictive  criterion, 
it  is  satisfied  by  many  behavior-based  and  reactive  robot  systems.  In  the  context 
of  robotics,  state  labels  a  set  of  related  robot  behaviors  required  to  accomplish  a 
task.  Thus,  the  search  state  may  consist  of  the  wandering  and  puck  detecting 
behaviors,  or  we  may  simply  take  each  behavior  to  be  a  separate  state.  The 
mathematical  model  consists  of  a  series  of  coupled  differential  equations,  one  for 
each  state,  each  of  which  describes  how  the  average  number  of  agents  in  that 
state  changes  in  time.  The  equations  may  be  solved  analytically  or  numerically, 
allowing  us  to  quantitatively  study  the  behavior  of  the  multi-agent  system. 
Below  we  construct  and  solve  a  mathematical  model  of  two  foraging  scenarios, 
with  an  emphasis  on  analyzing  the  effects  of  interference. 

Figure  10  shows  the  state  diagram  for  foraging  with  homing.  Initially  the 
robots  are  in  the  search  state.  When  a  searching  robot  encounters  a  puck,  it 
picks  it  up  and  moves  toward  the  “home”  region.  Execution  of  the  homing 
behavior  requires  a  period  of  time  7>,.  At  the  end  of  this  period,  the  robot 
deposits  the  puck  at  home  and  resumes  the  search  for  more  pucks.  While  a 
robot  is  either  searching  or  homing,  it  will  encounter  and  try  to  avoid  obstacles 
for  a  time  period  r  after  which  it  returns  to  its  previous  state.  There  are  two 
separate  avoiding  states  to  preclude  robots  from  moving  from  the  searching  to 
the  homing  state,  or  vice  versa,  through  the  common  avoiding  state. 

Each  state  in  the  diagram  corresponds  to  a  dynamic  variable.  Let  Ns{t), 
Nh(t),  N“v(t),  N£v(t)  be  the  number  of  searching,  homing,  avoiding  while 
searching  and  avoiding  while  homing  robots  at  time  t,  with  the  total  num- 
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Figure  10:  State  diagram  of  a  multi-robot  foraging  system  with  homing. 


ber  of  robots,  N0  =  Ns(t)  +  Nh(t)  +  N£v(t)  +  N£v(t),  a  constant.  We  model 
the  environment  by  letting  M(t)  be  the  number  of  undelivered  pucks  at  time  t. 
Also,  let  ar  be  the  rate  of  detecting  another  robot  and  ap  the  rate  of  detecting 
a  puck.  These  parameters  connect  the  model  to  the  experiment,  and  they  are 
related  to  the  size  of  the  robot  and  the  puck,  robot’s  detection  radius  and  the 
speed  of  the  robot.  It  was  shown  experimentally  [15]  that  interference  is  most 
pronounced  near  the  home  region,  because  the  density  of  robots  is,  on  average, 
greater  there.  Therefore,  we  expect  the  rate  of  encountering  other  robots  to  be 
greater  near  the  home  region  and  introduce  a'r,  the  rate  of  detecting  another 
robot  while  homing.  The  following  equations  describe  the  time  evolution  of  the 
dynamic  variables4: 

=  -apNs(t)[M(t)  -  Nh(t)  -  N%v(t)} 

-arNs(t)[Ns(t)  +  N0]  +  —Nh(t)  +  -N“v(t),  (25) 

Th  T 

=  apNs(t ) [M(t)  -  Nh(t )  -  N™(t)) 

-a'rNh(t)[Nh(t)  +  N0]  +  -Kv(t)  -  —Nh(t),  (26) 

T  Th 

=  a'rNh(t)[Nh(t)  +  JV0]  ^  -A T(t),  (27) 

r 

=  -~Nh(t).  (28) 

Th 

The  first  two  terms  in  Eq.  25  account  for  a  decrease  in  the  number  of  search¬ 
ing  robots  when  robots  find  pucks  and  start  homing,  or  when  searching  robots 
encounter  and  attempt  to  avoid  other  robots.  The  number  of  available  pucks 
is  just  the  number  of  pucks  in  the  arena  less  the  pucks  held  by  homing  robots. 
When  a  searching  robot  encounters  another  searching  robot,  both  start  execut¬ 
ing  avoidance  maneuvers,  decreasing  the  number  of  searching  robots  by  two; 
while  when  a  searching  robot  encounters  a  homing  or  either  of  the  avoiding 

4For  simplicity,  we  do  not  include  wall  avoidance  in  the  equations,  but  do  take  it  into 
account  when  fitting  model  to  the  data. 
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robots,  the  number  of  searching  robots  decreases  by  one.  The  total  decrease  is, 
therefore,  proportional  to  2 Ns  +  Nh  +  N°v  +  N%v  =  Ns  +  N0-  The  last  two 
terms  in  the  equation  require  more  explanation.  We  assume  that  it  takes  on 
average  r/,  time  for  a  robot  to  reach  home  after  grabbing  a  puck.  Then  the 
average  number  of  robots  that  deliver  pucks  during  a  short  time  interval  dt  and 
return  to  the  searching  state  can  be  approximated  as  dtNh/rh ■  Likewise,  in  a 
period  of  time  dt,  dtN£v /t  robots  leave  the  avoiding  state  and  resume  searching. 
Interference  will  increase  the  homing  time  for  each  robot;  therefore,  in  general, 
homing  time  will  be  a  function  of  N0,  r  and  r®,  the  average  homing  time  in  the 
absence  of  collisions  with  other  robots.  For  low  to  moderate  robot  densities,  it 
is  reasonable  to  assume  the  increase  will  be  linear  in  the  interference  strength. 
The  effective  homing  time  can,  therefore,  be  modeled  as 

Th  =  Th°[  1  +  ayrlVo]  •  (29) 

The  remaining  equations  have  similar  interpretations.  We  can  take  advan¬ 
tage  of  the  conservation  of  the  total  number  of  robots  to  compute  Nftv(t).  Equa¬ 
tions  25-28  are  solved  numerically  under  the  conditions  that  initially,  at  t  =  0, 
there  are  M0  pucks  and  N0  searching  robots. 

Figure  11  shows  the  time  evolution  of  the  fraction  of  searching  robots  and 
pucks  for  Mo  =  20,  iVo  =  5,  r  =  3  s,  =  16  s.  The  number  of  searching 
robots  (solid  line)  first  quickly  decreases  as  robots  find  pucks  and  carry  them 
home,  but  then  it  increases  and  saturates  at  some  steady  state  value  as  the 
number  of  undelivered  pucks  approaches  zero  (dashed  line).  The  fraction  of 
searching  robots  in  the  steady  state  is  inversely  proportional  to  the  avoiding 
time  parameter. 


Figure  11:  Time  evolution  of  the  fraction  of  searching  robots  (solid  line)  and 
undelivered  pucks  (dashed  line)  for  r  =  3  s,  ap  =  0.02,  ar  =  0.04,  and  a'r  =  0.08. 

In  order  to  compare  the  performance  of  different  size  groups,  we  define  the 
efficiency  of  the  system  as  the  inverse  time  required  for  the  group  to  collect  80% 
of  the  pucks  (M(T80%)/M0  =  0.2  in  Fig.  11(a)).  Figure  12(a)  shows  efficiency 
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of  the  group  vs.  group  size  for  two  different  interference  strengths,  as  measured 
by  r.  For  both  cases  the  efficiency  of  the  group  peaks  for  some  group  size, 
indicating  an  optimal  group  size  for  the  task.  The  efficiency  is  less  for  the 
group  with  a  higher  interference  strength,  or  larger  avoiding  time  parameter 
(solid  line).  Moreover,  the  greater  the  effect  of  interference,  the  smaller  the 
optimal  group  size.  However,  unlike  the  searching-and-avoiding  task,  in  this 
case  efficiency  has  a  maximum,  indicating  an  optimal  group  size  for  the  task. 
Moreover,  the  greater  the  effect  of  interference  (larger  r),  the  smaller  the  optimal 
group  size. 

The  final  plot  (Fig.  12(b))  shows  that  for  this  variant  of  the  foraging  task 
interference  causes  the  per-robot  efficiency  to  monotonically  decrease  with  group 
size  —  adding  a  robot  to  the  group  decreases  the  performance  of  all  robots, 
though  if  the  initial  group  size  was  less  than  the  optimal  size,  adding  a  robot 
will  increase  the  overall  efficiency  of  the  group. 


number  of  robots  number  of  robots 


Figure  12:  (a)  Efficiency  of  different  size  robot  groups  defined  as  the  inverse  of 
the  time  it  takes  the  group  to  collect  80%  of  the  pucks  in  the  arena  for  r  =  3  s 
(solid  line)  and  r  =  1  s  (dashed  line)  and  =  16  s,  ap  =  0.02,  ar  =  0.04, 
a'r  =  0.08.  (b)  Efficiency  per  robot  for  different  group  sizes 


3.2.3  Comparison  with  simulations 

We  validate  the  mathematical  model  by  comparing  its  predictions  to  the  results 
of  foraging  simulations.  We  used  Player/Stage  to  simulate  the  foraging  task 
with  groups  of  robots.  Player/Stage  is  a  client/server-based  scalable  multi-robot 
simulator  developed  at  the  USC  Robotics  Lab  [14].  Player  is  a  network-based 
interface  to  the  onboard  sensors  and  actuators  that  constitute  a  robot,  while 
Stage  supports  virtual  Player  robots,  sensing  and  moving  in  a  two-dimensional 
bitmapped  world,  that  interact  with  simulated  devices.  Available  sensor  models 
include  sonar,  laser  rangefinder,  pan-tilt-zoom  camera  with  color  “blob”  detec¬ 
tion  and  odometry. 

The  Stage  world  consists  of  a  circular  arena,  with  robots  and  pucks  initially 
randomly  distributed  around  the  arena.  Each  robot  comes  equipped  with  a 
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ring  of  16  sonars,  evenly  distributed  around  its  perimeter,  for  the  purpose  of 
obstacle  avoidance,  a  color  camera  and  a  vision  system  to  locate  “colored”  pucks, 
a  gripper  for  picking  up  the  puck,  and  an  odometry  system  to  help  robot  find 
“home”  and  move  towards  it.  We  simulated  foraging  task  in  groups  of  one  to  ten 
robots,  each  given  a  task  to  collect  (or  collect  and  deliver  home)  20  pucks.  For 
each  group  of  robots,  we  averaged  results  of  several,  usually  ten,  simulations. 
Simulation  parameters  are  listed  in  Table  1. 

Behavior  structure  The  robots’  behavior  structure  closely  replicates  that 
of  the  robots  studied  in  experiments  [15].  Behavior-based  control  governs  the 
actions  of  the  simulated  robots.  The  following  behaviors  were  used: 

0  Search  for  pucks:  robot  executes  a  random  walk  around  the  arena  until 
a  puck  is  found  with  a  camera.  The  puck  is  “painted”  some  bright  color, 
so  that  it  can  be  seen  with  a  color  camera.  The  size  of  the  puck  in  the 
robot’s  visual  field  must  exceed  some  minimum  detection  area  (in  pixels), 
before  the  robot  recognizes  it  as  a  puck. 

1  Collect  pucks:  under  this  behavior  the  robot  will  visually  servo  towards 
the  puck  and  collect  it  with  a  gripper.  The  gripper  may  fail  to  pick  up 
a  puck  with  some  small  probability,  consistent  with  failure  under  experi¬ 
mental  conditions  due  to  unreliability  of  real  grippers  and  sensor  update 
rates. 

2  Go  home:  after  the  puck  has  been  collected,  the  robot  will  odometrically 
servo  towards  the  home  location  and  deposit  the  puck  there.  Home  is  a 
semicircular  region  centered  on  a  point  at  the  edge  of  the  arena. 

3  Reverse  homing:  the  robot  moves  away  from  home  a  specified  distance 
in  a  random  direction. 

4  Avoid  collisions:  If  a  close  obstacle  (another  robot  or  arena  wall)  is 
sensed  at  any  time,  the  robot  will  turn  away  from  the  obstacle  in  a  random 
direction  at  40  deg/s  for  a  time  specified  by  the  avoid  time  parameter. 

For  purposes  of  analysis  only,  we  split  behavior  4  into  two  distinct  behaviors: 
4 — avoiding  collisions  while  behaviors  0,  1  and  3  are  active,  and  5 — avoiding 
collisions  while  homing,  i.e.,  ,  when  behavior  2  is  active. 

A  note  on  calculating  parameters  In  the  mathematical  models  presented 
below,  we  will  use  a  set  of  parameters  to  connect  the  model  to  experiments  and 
simulations.  The  main  parameters  we  will  use  are  ap,  ar,  the  rate  at  which 
a  robot  encounters  a  puck  and  another  robot  respectively.  In  principle,  these 
parameters  can  be  computed  ab  initio  by  taking  into  account  the  details  of 
the  robots  dimensions  and  sensing  capabilities  in  the  following  way:  as  a  robot 
travels  through  the  arena,  it  sweeps  out  some  area  during  time  interval  dt  and 
will  detect  objects  that  fall  in  that  area.  This  detection  area  is  vwidt ,  where  v  is 
robot’s  speed,  and  Wi  is  robot’s  detection  width  for  object  of  type  i.  This  number 


27 


Parameter 

Value 

Parameter 

Value 

#  of  robots 

1  -  10 

avoid  time 

3  s 

#  of  pucks 

20 

avoid  dist 

250  mm 

robot  radius 

0.2  m 

robot  speed 

300  mm/s 

puck  radius 

0.05  m 

min  detect  area 

200  pixels 

arena  radius 
home  radius 

3  m 
0.75  m 

rev.  homing  time 

10  s 

Table  1:  Simulation  parameters 


is  the  sum  of  the  sizes  of  the  robot  and  the  object  it  is  trying  to  detect,  and 
the  detection  distance  associated  with  the  sensing  hardware  it  is  using  to  detect 
that  object  {eg.  sonar,  camera  resolution,  etc.).  If  the  arena  radius  is  R  with  Nj: 
objects  of  type  i  distributed  uniformly  around  it,  a  robot  will  detect  these  objects 
at  a  rate  a,  =  vWiNi/irR2 .  This  idealization  is  useful  for  roughly  estimating 
model  parameters,  but  because  it  omits  all  the  details  of  the  experiment  (such 
as  sensor  errors  and  failures),  it  does  not  get  them  right.  A  better  way  is 
to  estimate  them  by  fitting  the  model  to  experimental  data,  or  by  calibrating 
the  model  by  measuring  these  parameters  experimentally  or  in  simulation  for 
a  single  robot  and  using  this  value  in  the  calculations.  In  order  to  estimate  ar 
by  calibration,  for  instance,  we  have  to  run  the  experiment  or  simulation  for 
two  robots  in  an  empty  arena,  keeping  track  of  the  number  of  times  each  robot 
attempts  collision  avoidance  maneuvers.  Likewise,  to  estimate  ap,  we  have  to 
run  the  experiment  or  simulation  for  a  single  robot  and  some  pucks  scattered 
around  the  arena,  keeping  track  of  the  rate  at  which  the  robot  picks  them  up. 
Although  we  did  not  perform  these  calibrations  explicitly,  we  can  estimate  the 
parameters  from  the  simulations  data:  ar  =  ncollisions /Ttotai  =  0.06  (note 
that  this  number  includes  wall  collisions),  and  ap  =  Ti/{ 20  •  Ttota{)  =  0.02. 
These  numbers  are  very  close  to  the  values  we  used,  which  we  determined  (by 
eye)  to  give  the  best  agreement  between  theory  and  simulations.  Note  that  this 
calibration  can  be  done  in  simulation  for  a  single  robot  for  an  environment  of 
arbitrary  complexity,  and  the  parameters  can  be  used  to  study  the  performance 
of  teams  of  robots  quantitatively  in  the  same  complex  environment. 

We  ran  foraging  simulations  for  groups  of  one  to  ten  robots  and  twenty 
pucks  randomly  scattered  around  the  arena.  In  the  results  presented  below, 
we  split  the  avoiding  behavior  into  two  behaviors:  4 — avoiding  while  searching, 
collecting  pucks  and  reverse  homing,  and  5 — avoiding  while  the  homing  behavior 
is  active. 

Analysis  of  Results  Table  2  lists  the  average  amount  of  time  (in  seconds) 
each  robot  spent  in  the  active  behaviors  during  the  time  it  took  the  group 
collect  the  pucks  and  deliver  them  home.  The  last  two  columns  list  the  average 
number  of  times  a  robot  attempted  to  avoid  collisions,  both  while  engaging  in  the 
non-homing  behaviors  and  while  homing,  during  the  time  it  took  the  group  to 
complete  the  task.  Although  in  all  cases  all  twenty  pucks  were  collected,  robots 
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rbts 

0 

1 

2 

3 

4 

5 

colls 

hcolls 

1 

307.64 

156.90 

265.11 

225.84 

73.69 

21.99 

23.7 

7.1 

2 

118.68 

81.07 

170.02 

101.89 

46.70 

45.08 

15.1 

13.5 

3 

94.80 

61.48 

143.22 

61.54 

57.65 

71.78 

17.8 

22.1 

4 

50.98 

39.71 

131.51 

34.06 

55.84 

99.85 

15.9 

29.5 

5 

53.14 

29.59 

126.84 

24.27 

69.52 

150.66 

18.9 

41.3 

6 

67.05 

28.89 

139.68 

20.32 

94.26 

224.40 

22.0 

53.3 

7 

137.90 

58.11 

111.32 

23.69 

130.21 

184.20 

37.0 

43.1 

8 

80.94 

32.94 

133.35 

17.06 

123.05 

265.56 

30.1 

62.3 

9 

74.62 

31.36 

153.58 

15.96 

130.10 

299.18 

33.7 

77.7 

Table  2:  Average  time  (in  seconds)  each  robot  spends  in  the  active  behaviors 
during  the  foraging  task  (0:  search,  1:  collect,  2:  home,  3:  reverse  home,  4: 
avoid,  5:  avoid  while  homing)  as  a  function  of  robot  group  size.  The  last 
two  columns  give,  respectively,  the  average  number  of  avoidance  maneuvers  per 
robot  while  searching/collecting/reverse  homing  and  while  homing. 


were  only  able  to  deliver  on  average  19.14  ±  0.53  of  them.  This  was  caused  by 
excessive  crowding  near  the  home  location.  In  the  current  implementation  of 
the  simulator,  robots  see  the  already  delivered  pucks,  and  if  there  are  no  other 
pucks  left  in  the  arena,  the  robots  will  all  go  home.  Although  reverse  homing 
acts  to  disperse  robots,  and  eventually  all  puck  should  be  delivered,  we  did  not 
run  the  simulations  long  enough  for  this  to  happen.  The  total  time  in  the  results 
presented  below  is,  therefore,  the  time  the  last  of  the  pucks  was  delivered. 

Figure  13(a)  graphically  displays  the  average  amount  of  time  each  robot 
spent  in  the  active  behaviors  while  foraging.  Fig.  13(b)  shows  the  fraction  of 
the  total  task  time  the  robot  was  homing  (behaviors  2  and  5  active) .  Note  that 
the  rate  of  increase  in  the  homing  time  per  robot  as  a  function  of  group  size 
appears  to  justify  our  assumption,  Eq.  29,  that  the  homing  time  increases  with 
the  size  of  the  group. 

3.3  Dynamic  Task  Allocation 

We  studied  adaptive  task  allocation  in  multi-robot  systems  [18,  25,  28].  This 
scenario  is  based  on  the  foraging  task.  Consider  an  arena  with  some  number 
of  pucks  scattered  about  it.  The  pucks  can  be  of  two  distinct  types,  Red  and 
Green.  Each  robot  can  be  tasked  to  collect  pucks  of  a  specific  type,  say  Red. 
When  the  robot’s  foraging  state  is  set  to  Red ,  it  is  searching  and  collecting  Red 
pucks.  The  robots  can  also  recognize  the  foraging  state  of  robots  that  are  visible 
to  it.  The  robots  have  no  a  priori  information  about  the  shape  of  the  arena, 
the  number  of  pucks  left  in  it  or  the  number  of  foraging  robots.  The  goal  of 
adaptive  task  allocation  is  to  design  a  robot  controller  that  will  allow  robots  to 
dynamically  adjust  the  division  of  labor,  so  that  the  number  of  robots  searching 
for  Red  and  Green  pucks  will,  over  time,  correctly  reflect  their  prevalence.  To 
achieve  this  group  behavior,  each  robot  must  be  able  to  dynamically  change  its 
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Figure  13:  (a)  Average  time  each  robot  spent  in  the  active  behaviors  during  the 
time  it  took  the  group  to  deliver  all  pucks  vs  robot  group  size,  (b)  Percentage 
of  time  each  robot  was  homing  as  a  function  of  group  size. 


foraging  type. 

The  solution  is  for  each  robot  to  count  the  number  of  pucks  of  each  type  in 
the  environment  as  well  as  the  number  of  robots  in  each  foraging  state  [18].  It 
does  so  by  observing  pucks  and  robots  that  are  visible  to  it  and  adding  these 
observations  to  history  (memory).  At  some  time  interval,  the  robot  uses  the 
history  array  to  estimate  the  fraction  of  pucks  and  robots  of  each  type,  and 
changes  its  foraging  state  according  to  a  transition  function. 

In  order  to  experimentally  demonstrate  the  dynamic  task  allocation  mecha¬ 
nism  we  made  use  of  a  physically-realistic  simulation  environment.  Our  simu¬ 
lation  trials  were  performed  using  Player  and  Gazebo  simulation  environments. 
Player  [14]  is  a  server  that  connects  robots,  sensors,  and  control  programs  over 
a  network.  Gazebo  [21]  simulates  a  set  of  Player  devices  in  a  3-D  physically- 
realistic  world  with  full  dynamics.  Together,  the  two  represent  a  high-fidelity 
simulation  tool  for  individual  robots  and  teams  that  has  been  validated  on  a 
collection  of  real-robot  robot  experiments  using  Player  control  programs  trans¬ 
ferred  directly  to  physical  mobile  robots.  Figure  14  provides  snapshots  of  the 
simulation  environment  used.  All  experiments  involved  20  robots  foraging  in  a 
400m2  arena. 

The  robots  used  in  the  experimental  simulations  are  realistic  models  of  the 
ActivMedia  Pioneer  2DX  mobile  robot.  Each  robot,  approximately  30  cm  in 
diameter,  is  equipped  with  a  differential  drive,  an  odometry  system  using  wheel 
rotation  encoders,  and  180  degree  forward-facing  laser  rangefinder  used  for  ob¬ 
stacle  avoidance  and  as  a  fiducial  detector/reader.  Each  puck  is  marked  with 
a  fiducial  that  marks  the  puck  type  and  each  robot  is  equipped  with  a  fiducial 
that  marks  the  active  foraging  state  of  the  robot.  Note  that  the  fiducials  do  not 
contain  unique  identities  of  the  pucks  or  robots  but  only  mark  the  type  of  the 
puck  or  the  puck  type  a  given  robot  is  engaged  in  foraging.  Each  robot  is  also 
equipped  with  a  2-DOF  gripper  on  the  front,  capable  of  picking  up  a  single  8 
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Figure  14:  Snapshots  from  the  simulation  environment  used,  (left)  An  overhead 
view  of  foraging  arena  and  robots,  (right)  A  closeup  of  robots  and  pucks. 
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cm  diameter  puck  at  a  time.  There  is  no  capability  available  for  explicit,  di¬ 
rect  communication  between  robots  nor  can  pucks  and  other  robots  be  uniquely 
identified. 


3.3.1  Mathematical  Model  of  Dynamic  Task  Allocation 

At  a  high  level,  coarse-grained  description,  each  robot  can  be  considered  to 
belong  to  either  Green  or  Red  foraging  state  during  a  sufficiently  short  time  in¬ 
terval.  In  reality,  each  state  is  composed  of  several  robot  actions  and  behaviors, 
such  as  wandering  the  arena,  detecting  pucks,  avoiding  obstacles,  etc.  However, 
since  we  want  the  model  to  capture  how  the  fraction  of  robots  in  each  foraging 
state  evolves  in  time,  it  is  a  sufficient  level  of  abstraction  to  consider  only  these 
states.  If  we  find  that  additional  levels  of  detail  are  required  to  explain  robot 
behaviors,  we  can  elaborate  the  model  by  breaking  each  of  the  high  level  states 
into  its  underlying  components. 

A  robot  uses  information  in  its  history  to  make  a  transition  between  states. 
A  robot  makes  a  transition  to  Red  foraging  state  according  to  a  transition 
function  that  depends  on  the  difference  between  the  estimated  fraction  of  Red 
robots  and  Red  pucks;  otherwise  it  makes  a  transition  to  the  Green  state. 

Let  Nft(t)  and  NG[t)  be  the  number  of  robots  in  Red  and  Green  foraging 
states  respectively  at  time  t,  and  MR(t)  and  MG[t)  be  the  number  of  uncollected 
Red  and  Green  pucks  in  the  arena.  These  dynamic  variables  correspond  to 
quantities  that  have  been  averaged  over  many  experiments  or  simulations.  The 
following  set  of  differential  equations  govern  how  the  average  numbers  of  robots 
and  pucks  evolve  in  time.  5 


dNR 
dt 
dM R 
dt 


otR{t)NG{t)  -  aG(t)NR(t) 

PR.MR(t)NR(t)  +  f.iR 


Due  to  conservation  of  robots,  NG  =  N  —  NR ,  where  N  is  the  total  number 
of  robots  (likewise,  Mq  =  N  —  MR).  Quantities  aR  and  aG  govern  the  rate 
at  which  robots  switch  to  Red  and  Green  states  respectively.  In  an  adaptive 
system,  these  are  time-dependent.  Parameter  j3R  is  the  rate  at  which  robots 
encounter  Red  pucks,  while  fiR  is  the  rate  at  which  new  Red  pucks  are  deposited 
in  the  arena  (likewise  for  Green  pucks).  For  simplicity,  /iR  and  fiG  are  such  that 
the  total  number  of  pucks  remains  constant.  Experimentally,  this  is  realized  by 
the  replacing  a  puck  in  a  new  random  location  after  a  robot  picks  it  up. 

It  is  more  convenient  to  work  with  the  average  density,  nR  =  NR/N,  rather 
than  the  number  of  robots.  Also,  we  may  safely  ignore  the  equations  for  pucks, 

5The  differential  equations  describing  the  evolution  of  a  dynamical  system  are  usually 
derived  as  a  continuous  limit  of  discrete  time  difference  equation,  for  example:  NR(t.  +  1)  = 
NR(t)  —  c>GAtNR(t)  +  a  A / Ay;; (t).  The  problem  with  this  approach  is  that  it  models  a 
synchronous  system,  where  all  robots  make  decision  at  the  same  time.  Although  feasible,  such 
a  model  is  not  realistic;  moreover,  most  choices  of  transition  rates  aR  and  aG  lead  to  severe 
oscillations  in  the  dynamic  variables.  The  differential  equations  model  we  are  working  with  is 
derived  from  the  stochastic  master  equation,  and  is  applicable  to  asynchronous  systems. 
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because  these  quantities  do  not  enter  the  equations  describing  time  evolution 
of  the  number  of  robots.  Dividing  both  sides  of  the  equation  by  N,  the  total 
number  of  robots,  reduces  it  to: 


=  aRnG(t )  -  aGnR(t ) .  (30) 

Likewise,  the  densities  of  Red  and  Green  pucks  are  mR  =  MR/M  and  me  = 
Mg/M. 


Transition  Rates  Equation  30  is  a  special  case  of  the  Rate  Eq.7  describing 
an  adaptive  system,  with  aR  and  aG  representing  the  history  averaged  tran¬ 
sition  rates  ( W)h ■  At  regular  time  intervals,  the  robot  looks  at  the  history  of 
observations  and  estimates  the  density  of  Red  pucks  and  robots  in  Red  state.  In 
general,  the  transition  probability  should  be  a  function  of  mR  —  nR ,  the  differ¬ 
ence  between  the  estimated  fractions  of  pucks  and  robots  in  a  particular  state 
(degenerate  to  choice  of  i?  or  G). 

At  the  collective  level  of  Equation  30,  the  macroscopic  transition  rates  aR 
and  aG  are  in  fact  simply  averaged  microscopic  transition  probabilities: 

«G  =  a(f(nR  -  mu))  p(nR,thR) 

oiR  =  a(f(nG  -  TOG)}p(ftG,mG)  (31) 

where  a  assures  the  proper  time  scale,  (...)p  stands  for  averaging  over  the  dis¬ 
tribution  P,  and  P(n,?7i)  is  the  joint  probability  that  a  robot  has  observed  the 
fraction  of  robots  and  pucks  of  a  corresponding  color  to  be  fi  and  m  respectively. 
We  note  that  for  sufficiently  large  history  lengths  can  approximate  P(n,m)  by 
a  sharply  peaked  distribution  around  its  mean  ((n),(m)).  This  suggests  that 
if  the  microscopic  transition  functions  are  smooth  enough,  then  the  effect  of 
averaging  is  to  replace  the  estimated  values  of  densities  with  their  mean  values 
(in  the  case  of  the  step  function,  the  effect  of  averaging  is  to  smear  out  the 
discontinuity) . 

A  steady  state  is  one  in  which  the  densities  of  robots  in  Red  or  Green 
states  no  longer  change.  Existence  of  the  steady  state  is  of  prime  interest  to 
the  designer,  because  if  a  system  has  one,  we  can  reliably  predict  its  long  term 
behavior.  In  the  adaptive  task  allocation  problem,  the  desired  steady  state  is 
one  in  which  the  distribution  of  robots  is  equal  to  the  distribution  of  pucks, 
namely,  nRtSS  =  mR  and  nGtSS  =  m-G-  In  our  previous  work  [25]  we  showed  that 
in  order  to  achieve  the  desired  steady  state,  the  transition  rates  must  have  the 
following  functional  form: 

aR(mr,hr)  =  mrg(rhr  —  nr),  (32) 

aG(mr,nr)  =  mgg(rhg  —  hg)  =  (1  —  mr)g(—mr  +  nr).  (33) 

Here  g(z)  is  a  continuous,  monotonically  increasing  function  of  its  argument 
defined  on  an  interval  [—1, 1].  We  consider  the  following  forms  for  g(z): 

•  Power:  g(z)  =  10CT/100 
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•  Stepwise  linear:  g(z)  =  zQ(z).6 

3.3.2  Comparison  with  Simulations 

Figure  15  shows  results  of  embodied  simulations  (solid  lines)  as  well  as  solutions 
to  the  stochastic  version  [28]  of  the  model  (dashed  lines)  for  different  values  of 
robot  history  length  and  forms  of  transition  function  (given  by  Eq.  (32)  and  33, 
with  g{z)  linear  or  power  function).  Initially,  the  Red  puck  fraction  (dotted  line) 
is  30%.  It  is  changed  abruptly  at  t  =  500  s  to  80%  and  then  again  at  t  =  2000  s 
to  50%.  Each  solid  line  showing  Red  robot  density  has  been  averaged  over 
10  runs.  We  rescale  the  dimensionless  time  of  the  model  by  parameter  10, 
corresponding  e  =  0.1.  The  history  length  was  the  only  adjustable  parameter 
used  in  solving  the  equations.  The  values  of  h  used  to  compute  the  observed 
fraction  of  Red  robots  were  h  =  2,  8,  16,  corresponding  to  experimental  history 
lengths  10,  50,  100  respectively.  For  mr,  the  observed  fraction  of  Red  pucks, 
we  used  their  actual  densities. 

Solutions  exhibit  oscillations,  although  eventually  oscillations  decay  and  so¬ 
lutions  relax  to  their  steady  state  values.  In  all  cases,  the  steady  state  value  is 
the  same  as  the  fraction  of  red  pucks  in  the  arena.  History-induced  oscillations 
are  far  more  pronounced  for  the  linear  transition  function  (Figure  15(a))  than  for 
the  power  transition  function  (Figure  15(b)).  For  the  power  transition  function, 
these  oscillations  are  present  but  become  evident  only  for  longer  history  lengths. 
This  behavior  is  probably  caused  by  the  differences  between  the  values  of  tran¬ 
sition  functions  near  the  steady  state:  while  the  value  of  the  power  transition 
function  remains  small  near  the  steady  state,  the  value  of  the  linear  transition 
function  grows  linearly  with  the  distance  from  the  steady  state,  thereby  ampli¬ 
fying  any  deviations  from  the  steady  state  solution.  The  amplitude  and  period 
of  oscillations  and  the  convergence  rate  of  solutions  to  the  steady  state  all  de¬ 
pend  on  history  length,  and  it  generally  takes  longer  to  reach  the  steady  state 
for  longer  histories.  Another  conclusion  is  that  the  linear  transition  function 
converges  to  the  desired  distribution  faster  than  the  power  function,  at  least  for 
moderate  history  lengths. 

3.4  Target  Localization  with  Microscopic  Robots 

Let  us  consider  a  D-dimensional  volume  with  multiple  targets  that  release  cer¬ 
tain  chemical  into  the  environment.  The  task  of  the  microscopic  swarm  is  to 
aggregate  at  these  targets  in  order  to  carry  out  some  actions  in  the  vicinity  of 
the  targets.  This  capability  is  fundamental  to  many  medical  applications  envi¬ 
sioned  for  these  microscopic  robots.  For  example,  the  volume  of  fluid  may  be  a 
blood  vessel  that  has  been  damaged.  Robots  are  required  to  aggregate  at  the 
injury  site  in  order  to  assist  in  healing,  forming  clots,  etc. 

We  consider  a  simple  robot  controller  that  on  a  high  level  can  be  thought  to 
consist  of  3  discrete  states  described  below: 

(,The  step  function  0  is  defined  as  (~)(z)  =  1  if  z  >  0;  otherwise,  it  is  0.  The  step  function 
guarantees  that  no  transitions  to  Red  state  occur  when  mr  <  nr. 
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(a)  Linear  transition  function 


History  length  10 


History  length  100 


Simulation  Time  (seconds) 


(b)  Power  transition  function 


Figure  15:  Evolution  of  the  fraction  of  Red  robots  for  different  history  lengths 
and  transition  functions,  compared  to  predictions  of  the  model 
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State  1  (search):  Do  a  biased  random  walk  in  the  direction  of  the  commu¬ 
nicative  signal  concentration  gradient. 

State  2  (communicate):  Move  towards  the  chemical  source  following  the  con¬ 
centration  gradient  of  the  target  chemical  and  release  communicative  sig¬ 
nal  to  other  robots. 

State  3  (disperse):  Move  away  from  the  target  in  the  direction  opposite  to 
the  target  chemical’s  concentration  gradient  for  some  specified  time  r. 

To  fully  specify  a  robot’s  behavior,  we  also  have  to  describe  the  transitions 
between  these  states.  The  robots  start  out  in  State  1,  the  searching  for  targets 
using  random  diffusive  motion  and  following  the  gradient  of  the  communicative 
signal.  Once  the  concentration  of  the  target  chemical  at  a  certain  point  in  space 
is  sufficiently  high  the  robot  at  that  point  will  switch  to  the  State  2:  it  will  start 
moving  towards  regions  of  high  concentration  (using  biased  diffusion  or  gradient 
following)  while  releasing  a  new  chemical  which  acts  as  a  communication  signal 
to  attract  other  robots.  With  some  probability  (that  can  be  fixed,  or  dependent 
on  the  concentration  of  the  robots  at  the  source),  robots  in  the  State  2  will 
switch  to  State  3,  where  they  will  disperse  from  the  source,  moving  in  the 
direction  opposite  to  the  gradient.  Finally,  robots  in  the  State  3  will  switch  to 
the  searching  state  with  probability  1/r.  The  last  behavior  ensures  that  robots 
will  not  be  stuck  at  local  maxima  of  the  chemical  potential. 

3.4.1  Mathematical  Model  of  Target  Localization  Using  Chemical 
Fields 

Let  denote  by  ni(x),  n2(x),  n 3(x)  the  fraction  of  robots  in  each  state  at  point 
x,  with  normalization  condition 

«i(x)  +  n2(x)  +  n3(x))  =  1. 

Let  p(x)  and  c(x)  be  the  concentrations  of  the  chemical  released  from  the  targets 
and  the  communicative  signal  released  by  robots  in  State  2.  We  also  denote  by 
\PD  and  \CD  the  robots’  drift  velocity  in  the  concentration  gradients  of  chemical 
(released  by  the  targets)  and  communicative  signal  (released  by  the  robots), 
respectively.  Then  the  set  of  equations  describing  the  evolution  of  the  system 
is  as  follows: 

DiV2m  -  v  •  Vm  -  V  •  [Vfpri] 
rnF(p)  +  ^  (34) 

T 

D2V2n2  -  v  •  Vn2  -  V  •  [V£,?i2] 
niF{p)  -  G(n2,  p,  c)n2  (35) 


dt 


dn2 

~dt 

+ 
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dn3 

dt 


(36) 


=  D3V2n3  -  v  •  Vn3  +  V  •  [’ VpDn3 ] 

+  G(?i2,p,c)?i2  - — 

T 

where  .F(p)  is  the  concentration-dependent  transition  rate  from  State  1  to  State 
2,  G(ri2\p',c)  is  the  transition  rate  from  State  2  to  State  3,  and  1/r  is  the 
probability  that  a  robot  in  State  3  will  switch  to  State  1. 

Equations  34-36  have  a  simple  intuitive  interpretation.  The  first  two  terms 
in  Eq.  34  describe  robots  motion  in  State  1:  diffusive  searching  and  following 
communicative  signal,  if  present.  The  third  term  describes  the  drift  in  the  flow. 
The  fourth  term  describes  transitions  to  State  2  at  the  rate  F(p),  which  depends 
on  the  concentration  of  the  target  field.  The  last  term  describes  transition  of 
robots  from  State  3  to  State  1  after  the  robots  have  moved  in  the  direction 
opposite  to  the  concentration  gradient  for  a  period  of  time  r.  G(ri2,p,c)  is 
the  rate  at  which  robots  transition  from  State  2  to  State  3,  and  it  could  in 
principle  depend  on  the  local  concentrations  of  the  gradients,  as  well  as  the 
number  of  agents  present  at  the  target  site,  for  example,  when  presence  of  a 
certain  minimum  number  of  robots  is  required  for  executing  an  action. 

We  have  to  complement  these  three  equations  with  two  more  to  account  for 
the  evolution  of  chemicals  p  and  c  as  follows: 

M 

DpS72 p  —  v  •  Vp  +  ^  Qi6(x  —  Xj)  —  7Pp  (37) 

i= 1 

DcV2c  —  v  •  Vc  +  gcn2  —  7cc  (38) 

In  Eq.  37  x,-s,  i  =  1,2,  ..M  are  the  locations  of  the  target  sources,  Qi  is  the 
intensity  of  source  i,  and  is  the  decay  rate  of  the  target  chemical.  Similarly, 
in  Eq.  38  qc  is  the  intensity  of  communication  signal  released  by  a  robot  in  State 
2,  while  7C  is  the  decay  rate  of  the  signal. 

Simplification:  1-dimension  In  this  section  we  present  results  for  a  ID 
geometry  and  a  single  target  scenario.  We  consider  the  case  when  the  liquid 
flow  is  very  slow  compared  to  other  time  scales  so  we  can  set  v  =  0.  Also,  since 
there  is  only  one  target,  we  neglect  the  third  (dispersing)  behavior  so  that  two 
possible  states  are  State  1  (“search”)  and  State  2  (“communicate”).  The  target 
is  located  at  x  =  1  and  serves  as  a  point  source  for  the  chemical.  We  assume 
that  the  diffusion  of  the  chemical  happens  much  faster  compared  to  robots’ 
diffusion,  and  it  quickly  reaches  its  steady  state  profile.  Hence,  the  equation  for 
evolution  of  p(x,  t)  can  be  solved  separately,  with  a  solution 

p(x,  oo )  =  p(x)  =  Q0e~'Slp/Dp^1~x\  0  <  x  <  1.  (39) 

For  the  results  presented  here  we  used  Qo  =  0.1,  Dp  =  0.2  and  =  0.5. 

All  robots  start  at  State  1  and  are  initially  localized  at  x  =  0.  We  assume 
that  a  transition  from  State  1  to  State  2  happens  whenever  a  robot  in  State 


dp 

dt 

dc 

dt 
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1  detects  the  target’s  chemical  above  a  certain  threshold  level  po,  so  that  the 
transition  rate  is  F(p)  =  9(p  —  po),  where  9(x)  is  the  step  function,  9(x)  =  1 
if  x  >  0  and  9(x)  =  0,  x  <  0.  While  in  State  2,  robots  move  in  the  chemical 
gradient  with  a  constant  drift  velocity  Vo  and  release  a  communicative  signal 
with  intensity  qc. 

To  proceed  further,  we  need  to  specify  the  dependence  of  the  drift  velocity 
in  State  1  on  the  concentration  of  communicative  signal  c.  Again,  we  assume 
that  once  a  robot  detects  communicative  signal  above  certain  threshold  cq,  it 
propels  itself  through  the  fluid  in  the  direction  of  the  gradient  with  a  constant 
drift  velocity  Vo-  Then  the  dynamics  of  the  system  is  described  by  the  following 
system  of  equations: 


dn\ 

=  D, 

Ot 

dn2 

=  D 

It 

dc 

=  D, 

dt 

d2n2  dn2 

l  ^  9  Kd  ^ 
oxz  ox 

d  2c 

dx2 


F{p)m 


+  qcn2  -  7c c 


(40) 

(41) 

(42) 


Analysis  of  Results  To  study  the  effect  of  different  design  parameters  on 
aggregation  behavior  of  the  robots  at  the  target,  we  solved  the  system  Eq.  40- 
42  numerically.  We  used  the  following  parameters  (in  dimensionless  units): 
Dn  =  0.01,  Dc  =  0.05,  Vd  =  0.1,  qc  =  0.1,  qc  =  0.01.  For  the  detection 
thresholds  we  used  Co  =  0.001  and  po  =  0.01,  the  later  assuring  that  that 
robots  detect  the  chemical  approximately  midway  in  the  interval  [0,1].  We  used 
reflective  boundary  conditions  for  n\  and  n2,  dn \/dx\op  =  dn2/ 9x|o,i  =  0,  and 
absorbing  boundary  conditions  for  c,  c(0)  =  c(l)  =  0. 

In  Fig.  16  we  plot  the  spatia-temporal  evolution  of  robots’  densities  with  and 
without  communication.  Clearly,  the  density  peak  at  x  =  1  is  stronger  for  the 
system  with  communicative  behavior.  This  suggests  that  communication  indeed 
helps  the  robots  to  aggregate  better.  In  addition,  the  aggregation  process  with 
communication  happens  faster  than  without  communication.  This  is  also  shown 
in  Fig.  17,  where  we  plot  the  density  of  robots  at  x  =  1  as  a  function  of  time 
for  three  different  cases:  free  diffusion'  (Vo  =  0),  gradient  following  without 
communication  (Vo  ^  0 ,  qc  =  0),  and  gradient  following  with  communication 
(Vd,  qc  7^  0).  As  it  can  be  seen  from  Fig.  17,  the  systems  with  gradient  following 
and  communicative  behavior  do  demonstrate  aggregative  behavior,  and  it  is 
more  pronounced  for  the  system  with  communication.  For  instance,  at  time 
t  =  10  the  robot  density  at  x  =  1  and  with  communication  is  more  than  3  times 
higher  than  in  the  non-communicating  case. 

One  of  the  design  objectives  is  to  have  robots  aggregate  at  the  target  fast 
enough,  while  at  the  same  time  not  dissipating  too  much  power  due  to  the 
propelling.  To  examine  this  tradeoff,  let  us  consider  the  dependence  of  the 

7 Note  that  the  absence  of  aggregation  for  free  diffusing  robots  is  due  to  reflective  boundary 
conditions  at  the  source  for  ni  and  no.  If  one  employs  absorbing  boundary  conditions  instead, 
robots  will  demonstrate  aggregative  behavior  even  with  free  diffusion. 
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Figure  16:  Time  evolution  of  robot  densities  without  communication  (a)  and 
with  communication  (b) 
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Figure  17:  Time  evolution  of  robot  densities  at  x  =  1  for  three  different  strate¬ 
gies 


aggregation  time  (defined  as  time  needed  for  fraction  n0  of  robots  to  reach 
the  vicinity  of  the  target  which  we  define  as  the  interval  [0.95, 1],)  on  the  drift 
velocity  Vo-  In  Fig.  18  we  plot  aggregation  time  vs  Vo  for  three  different 
values  of  n-o-  One  observes  that  if  increasing  the  drift  velocity  from  Vo  =  0, 
the  aggregation  time  decreases  monotonically,  with  a  steeper  decline  for  larger 
n o-  However,  it  soon  “saturates”,  so  that  increasing  Vo  further  has  very  small 
effect  on  the  aggregation  time.  This  is  because  for  large  values  of  Vo/Dn,  the 
aggregation  time  is  mainly  dominated  by  time  required  for  robots  to  diffuse  and 
detect  chemical  gradient,  and  increasing  Vo  clearly  does  not  have  any  effect  on 
this  time.  Hence,  depending  on  the  desired  number  of  robots  in  the  vicinity  of 
the  target,  as  well  as  the  required  aggregation  time,  the  best  strategy  for  robots 
might  be  to  have  a  moderate  drift  velocity.  Note  that  this  type  of  analysis  can 
be  used  to  assess  the  energy-efficiency  of  various  behaviors  since  power  required 
to  propel  a  robot  through  a  fluid  with  velocity  Vo  scales  with  Vo  ■ 

4  Distributed  Resource  Allocation  in  the  Grid 

Grid  computing  is  an  emerging  technology  that  enables  users  to  share  a  large 
number  of  computing  resources  distributed  over  a  network.  The  dynamic,  feder¬ 
ating  nature  of  Grid  policy  environments  is  dominated  by  virtual  organizations 
(VOs)  which  associate  heterogeneous  users  and  resource  providers.  Users  have 
resource-consuming  activities,  or  jobs ,  that  must  be  mapped  to  specific  resource 
providers  through  a  resource  allocation  mechanism.  The  resource  allocation 
mechanism  may  choose  among  alternate  mappings  in  order  to  optimize  some 
utility  metric,  within  the  bounds  permitted  by  the  VO  policy  environment. 
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Figure  18:  Aggregation  time  as  a  function  of  drift  velocity,  for  three  different 
values  of  no- 


It  is  envisioned  that  deployment  of  Grid  technology  will  grow  from  its  cur¬ 
rent  modest  scale  to  eventually  overlay  the  global  Web.  It  is  not  known  how 
large  individual  VOs  will  be,  but  it  is  reasonable  to  imagine  resource  sharing 
among  populations  with  tens  of  thousands  of  users  and  thousands  of  resources. 
Hence,  allocation  mechanisms  need  to  be  highly  scalable  and  robust  to  localized 
failures  in  resources  and  communication  paths.  From  the  perspective  of  a  sin¬ 
gle  VO,  the  dynamic  policy  environment  can  be  viewed  as  the  dynamic  arrival 
and  departure  of  users  and  resources  (occurring  at  a  higher  rate  than  users  and 
resources  actually  associate  with  and  disassociate  from  the  global  Grid  infras¬ 
tructure).  Some  very  large  VOs  may  have  an  overlayed  hierarchical  structure, 
but  this  structure  does  not  necessarily  map  to  underlying  physical  or  geographic 
hierarchy.  Scalable  Grid  allocation  mechanisms  need  to  focus  on  the  VO  policy 
environment  rather  than  physical  locations. 

Although  there  has  been  considerable  attention  given  to  the  resource  allo¬ 
cation  problem  in  the  Grid,  very  few  researchers  have  addressed  the  problem 
from  the  perspective  of  learning  and  adaptation.  Meanwhile,  the  multi-agent 
systems  (MAS)  and  distributed  AI  communities  have  shown  that  groups  of  au¬ 
tonomous  learning  agents  can  successfully  solve  different  load  balancing  and 
resource  allocation  problems  [41,  12].  The  goal  of  this  paper  is  to  apply  multi¬ 
agent  learning  techniques  to  the  problem  of  resource  allocation  in  the  Grid.  The 
MAS  approach  is  well  suited  for  describing  the  Grid,  because  the  distributed, 
autonomous  nature  of  agents  (Grid  users  and  resources)  reflects  the  federated 
nature  of  the  Grid.  Introducing  learning  allows  the  multi-agent  system  to  adapt 
to  changes,  such  as  the  changing  resource  capacities,  resource  failure,  or  intro¬ 
duction  of  new  agents  into  the  system.  Furthermore,  we  believe  that  the  MAS 
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approach  will  prove  useful  for  policy  design,  because  it  can  be  used  to  study 
the  performance  of  a  VO  implementing  a  given  resource  allocation  strategy  to 
verify  that  it  does  not  lead  to  any  unintended  global  consequences. 

4.1  Grid  Scheduling  Issues 

Due  to  decentralized  Grid  policies,  portions  of  the  Grid  may  use  different  allo¬ 
cation  strategies,  and  a  centralized  allocation  manager  is  not  feasible.  However, 
the  Grid  vision  assumes  that  standard  mechanisms  will  be  deployed  which  can 
be  configured  with  appropriate  localized  policies.  To  a  large  degree,  traditional 
scheduling  systems  are  distinguished  by  their  strategy,  as  embodied  in  algo¬ 
rithms  and  deployment  parameters.  The  wide  deployment  of  the  GRAM  [48] 
job-submission  interface  has  demonstrated  that  contemporary  job  scheduling 
systems  are  architecturally  consistent.  Previous  Grid  architecture  work  leads  us 
to  believe  individual  users,  as  well  as  brokering  intermediaries,  will  apply  alloca¬ 
tion  strategies  to  their  own  jobs  in  addition  to  the  traditional  resource  providers 
making  allocation  decisions  for  sets  of  jobs  onto  large  (high-performance  or  ag¬ 
gregate)  resources.  Hence,  it  is  imperative  to  understand  what  the  impact  of 
these  decision  will  be  on  the  efficiency  of  overall  resource  utilization  in  the  sys¬ 
tem.  Understanding  of  the  effects  of  different  resource  allocation  mechanisms  on 
global  system  behavior  will  influence  architectural  decisions  as  well  as  the  poli¬ 
cies  chosen  within  federated  VOs.  In  this  paper  we  examine  a  specific  case  when 
the  allocation  decisions  by  individual  users  are  based  on  reinforcement  learning 
and  study  the  global  performance  of  a  VO  implementing  this  mechanism. 

A  further  challenge  to  Grid  resource  allocation  lies  in  the  lack  of  accurate 
resource  status  information  at  the  global  scale.  The  allocation  strategies  em¬ 
ployed  by  users  and  brokers  have  limited  real-time  environment  knowledge  at 
their  disposal.  This  suggests  that  feasible  allocation  mechanisms  should  not 
depend  strongly  on  the  availability  of  current  global  knowledge.  The  multi¬ 
agent  learning  approach  studied  here  relies  on  minimal  monitoring  capabilities 
to  compare  resources,  only  requiring  that  the  agent  obtain  status  signals  for 
job  requests  issued  by  the  same  agent.  However,  a  simplifying  assumption  in 
our  simulations  is  that  an  existing  discovery  and  policy-introspection  system 
permits  the  agents  to  scope  their  internal  model  of  available  resources  to  an 
appropriate  rough  set. 

4.2  Multi-Agent  Reinforcement  Learning 

Reinforcement  learning  (RL)  [44]  is  a  powerful  framework  in  which  an  agent, 
for  example,  a  Grid  user,  learns  optimal  actions  through  a  trial  and  error  explo¬ 
ration  of  the  environment  and  by  receiving  rewards  for  its  actions.  The  reward 
(utility)  function  defines  what  the  good  and  bad  actions  are  in  different  situa¬ 
tions.  The  agent’s  goal  is  to  maximize  the  total  reward  it  receives.  For  a  single 
agent  in  a  stationary  environment,  the  problem  reduces  to  finding  the  optimal 
policy.  In  the  multi-agent  setting,  however,  the  environment  is  highly  dynamic 
because  of  the  presence  of  other  learning  agents,  and  the  usual  conditions  for 
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convergence  to  an  optimal  policy  do  not  necessarily  hold.  Nevertheless,  vari¬ 
ous  generalizations  of  single  agent  learning  algorithms  have  been  successfully 
applied  to  multi-agent  settings. 

We  construct  a  multi-agent  model  of  resource  allocation  for  the  Grid  that  is 
simplified,  yet  maintains  the  main  features  of  the  Grid  environment:  heterogene¬ 
ity  of  dynamic,  large-scale  populations  of  users  and  resources.  In  our  system,  a 
large  number  of  users  submit  jobs  to  one  of  the  resources  that  are  scheduled  by 
a  local  scheduler  according  to  local  policies.  The  users  are  modelled  as  rational, 
selfish  agents  that  try  to  maximize  their  utilities,  (i.e.,  complete  their  jobs  in 
the  shortest  possible  time).  The  agents  have  no  prior  knowledge  about  the  re¬ 
sources.  Instead,  they  utilize  a  simple  reinforcement  learning  scheme  to  estimate 
the  efficiency  of  different  resources  based  on  their  past  experience.  We  analyze 
the  global  behavior  of  the  system  by  numerical  simulations,  and  compare  it  with 
a  baseline  algorithm  that  makes  use  of  a  global  knowledge  of  current  resource 
loads.  Our  results  illustrate  that  reinforcement  learning  can  be  used  to  improve 
the  quality  of  resource  allocation  in  a  large  scale  heterogenous  system. 

4.3  The  Model 

In  real  Grid  applications  the  problem  of  mapping  resources  to  specific  jobs  can 
be  very  complex,  and  may  require  co-allocation  of  different  resources  such  as 
specific  amount  of  CPU  hours,  system  memory,  network  bandwidth  for  data 
transfer,  etc.  In  this  paper  we  neglect  the  need  for  co-allocation,  and  assume 
that  jobs  generated  by  a  user  require  only  certain  CPU-time  so  that  they  are 
uniquely  characterized  by  their  duration. 

4.3.1  Resources  Providers 

The  local  scheduling  of  computational  tasks  is  a  challenging  problem  in  itself. 
Usually,  resources  are  characterized  by  the  number  and  speed  of  the  processors 
available,  system  memory,  as  well  as  storage  space.  Multiple  jobs  can  be  run 
simultaneously  in  the  system,  with  the  allocation  of  the  CPUs  to  the  tasks 
determined  by  the  local  scheduling  policies.  There  are  many  different  scheduling 
frameworks,  such  as  FCFS  (First  Come  First  Serve),  LJF  (Long  Job  First),  etc. 
Some  scheduling  algorithms  are  adaptive:  they  chose  the  appropriate  scheduling 
strategy  depending  on  the  type  of  the  jobs  in  the  flow. 

The  scheduling  decision  for  a  contemporary  batch  system  is  too  computa¬ 
tionally  expensive  for  us  to  perform  thousands  of  times  per  time-step  in  our 
agent  simulations.  In  our  model,  we  consider  a  simplified  representation  of  the 
resources  and  local  schedulers.  Namely,  we  assume  that  each  resource  is  char¬ 
acterized  by  its  processing  power  P  which  is  defined  as  a  CPU  time  needed  to 
complete  a  job  of  a  unit  length.  Within  this  framework,  there  is  only  a  single 
job  running  at  the  system  at  a  given  time  (note  that  this  approach  is  different 
from  one  adopted  in  Ref[41]  where  the  capacity  of  the  resource  was  assumed  to 
be  shared  equally  over  all  the  jobs  in  the  queue).  For  simplicity,  we  will  assume 
that  all  the  local  schedulers  prioritize  the  jobs  by  their  arrival  time  (FCFS). 
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4.3.2  Users 

In  general,  users  can  be  thought  of  as  either  individual  agents  that  generate  jobs 
and  try  to  map  resources  for  their  execution,  or  as  external  resource  brokers  that 
map  jobs  on  behalf  of  many  individual  users.  For  the  sake  of  concreteness,  we 
consider  the  first  scenario,  although  the  modelling  approach  developed  in  this 
paper  can  apply  to  either  case.  For  the  case  where  user-to-broker  relationships 
are  relatively  static,  i.e.,  based  on  VO  structural  policies,  we  would  not  expect 
impact  on  the  simulation  scenarios  other  than  that  broker  agents  have  higher 
densities  of  job  request.  If  the  user-to-broker  relationship  is  dynamic,  i.e.,  based 
on  user  observation  of  broker  performance,  the  system  behavior  may  be  more 
dynamic  and  explicit  multi-tier  study  is  required  for  those  scenarios. 

We  model  users  as  heterogenous  selfish  agents  that  try  to  maximize  their 
utilities.  Clearly,  one  can  define  agent’s  utilities  in  various  ways.  Often,  the 
agents  are  interested  in  minimizing  the  waiting  time  for  the  jobs  they  submit, 
hence,  they  will  prefer  the  resource  with  the  minimal  (resource  performance- 
normalized)  queue  length.  Another  user-centric  measure  is  the  response  time 
which  is  the  time  elapsed  between  the  job  generation  and  its  completion.  Clearly, 
this  metric  depends  not  only  on  the  length  of  the  queue,  but  also  on  the  actual 
processing  capacity  of  a  resource:  on  the  resources  with  larger  capacities  the 
actual  runtime  of  a  job  will  be  less.  Other  possible  metrics  might  be  based  on 
the  accuracy  of  the  completion  time  prediction. 

In  this  paper,  we  used  weighted  contributions  of  two  metrics:  pi  =  atTw  + 
(1  —  ai)Texci  where  Tw  is  the  queue  wait  time,  and  Texc  is  the  job  execution 
time  normalized  to  the  duration  of  the  job  (i.e.,  inverse  resource  capacity).  To 
account  for  the  heterogeneity,  the  weights  were  chosen  randomly  for  each 
agent.  Note  that  using  only  the  second  contribution  (a*  =  0)  would  bias  the 
selection  towards  the  resource  with  the  highest  capacity  with  no  concern  about 
the  queue  length  at  that  resource.  For  sufficiently  high  loads  this  would  lead  to 
infinitely  growing  queue.  To  prevent  this  from  happening  we  used  lower  bound 
for  a;  at  at  =  0.2 


4.3.3  Resource  Selection 

To  complete  the  definition  of  our  model,  we  need  to  describe  how  the  agents  se¬ 
lect  resources.  As  the  name  “reinforcement  learning”  suggests,  agents  use  their 
past  experience  to  choose  between  the  resources.  There  are  many  different  ways 
to  incorporate  reinforcement  learning.  In  this  paper  we  use  Q-learning.  For  each 
possible  action  (i.e.,  selecting  a  specific  resource)  the  agent  keeps  a  Q- value  that 
indicates  the  efficiency  of  that  resource  in  the  past.  For  each  new  job,  agents 
chooses  a  resource  according  to  the  e-greedy  rule:  with  probability  (1  —  e)  it 
choose  the  resource  with  the  highest  Q- value  (ties  are  broken  randomly),  while 
with  (small)  probability  e  the  agent  chooses  randomly  and  uniformly  chooses 
among  the  other  resources.  After  each  completed  job,  the  agent  gets  a  reinforce¬ 
ment  signal  (containing  the  start-time  and  the  end  time  for  that  job),  calculates 
the  metric  Ei,  and  translates  it  into  a  reward  r  for  resource  i  that  we  have 
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chosen  as  follows:  r  =  sign((pi)  —  p{),  where  (pi)  is  the  utility  averaged  over 
all  the  submitted  jobs.  Finally,  the  agent  updates  the  Q  values  according  to 
Qi,t+i  <—  Qi,t  +  ot(r  —  Qi,t)i  where  a  is  the  learning  rate. 

To  compare  the  performance  of  the  RL  algorithm  we  also  studied  two  other 
resource  selection  rules: 

Random  Selection:  Agents  are  choose  randomly  with  uniform  probability 
between  the  resources.  As  we  will  see  below,  the  performance  of  this  algorithm 
is  very  limited  in  the  case  of  widely  heterogenous  resource  capacities  that  we 
are  interested  in. 

Least  loaded:  In  this  model  agents  choose  the  least  loaded  resource  to  submit 
a  job.  Note  that  this  selection  rule  assumes  that  agents  have  an  up  to  date 
information  about  the  current  utilization  level  of  the  resources.  This  can  be 
done  by  keeping  a  global  registry  with  the  load-level  of  each  resource.  To 
escape  crowding  effects,  where  many  agents  choose  the  least  loaded  resource 
simultaneously,  the  resource  load  has  to  be  modified  immediately  after  a  job 
is  submitted.  This  would  lead  to  a  near  ideal  schedule  for  our  scenario.  Note, 
however,  that  in  real  environments  the  information  is  usually  not  up  to  date. 
We  have  studied  the  impact  of  crowding  effect  by  introducing  a  parameter  p  so 
that  once  a  job  is  submitted  to  a  resource,  the  load  of  that  resource  is  updated 
only  with  probability  p  (for  the  results  presented  in  this  paper  we  have  used 
p  =  1/4).  This  parameter  affects  the  apparent  temporal  coherence  of  the  global 
knowledge  shared  by  the  agents. 

4.4  Experimental  Results 

In  this  section  we  present  the  results  of  simulations  of  our  model  for  N  =  1000 
agents  and  and  R  =  250  resources.  We  neglect  the  network  topology  and 
the  communication  costs  associated  with  it.  Instead,  we  assume  that  each  of 
the  users  can  submit  jobs  to  any  of  the  resources.  At  each  time  step,  agents 
independently  generate  jobs  at  rate  P  =  [0.1, 0.2].  The  length  of  the  jobs  are 
taken  randomly  from  the  uniform  distribution  in  the  interval  [Jmini  Jmax\-  To 
take  into  account  the  wide  dispersion  in  the  job  sizes  in  real  Grid  applications, 
we  chose  Jmin  =  10  and  Jmax  =  1000  =  100Jmi„.  Note  that  such  a  wide 
dispersion  in  the  job  sizes  (as  well  as  resource  capacities)  is  typical  for  the 
Grid.  The  capacities  of  the  resources  were  also  chosen  uniformly  in  the  interval 
\Crnim  Crnax\- 

Let  us  first  consider  a  situation  when  the  dispersion  in  the  resource  capacities 
Cmax  ~  Crnrn  is  relatively  small,  Cmax  =  350,  Crnin  =  650.  To  characterize  the 
system  performance,  we  define  the  load  of  a  resource  as  the  total  queue  length 
divided  by  the  resources  capacity.  In  Fig.  19  we  plot  the  load  in  the  system 
averaged  over  the  resources  as  a  function  of  time.  Note  that  the  non-zero 
average  load  for  the  Least  Loaded  algorithm  is  due  to  probabilistic  failure  to 
update  the  load  levels  after  each  submission.  For  small  value  of  job  arrival 
rate  (the  top  figure)  the  random  selection  algorithm  performs  better  than  both 
Least  Loaded  and  RL — if  the  job  load  is  sufficiently  low,  choosing  resources 
randomly  guarantees  load  balancing.  The  situation  changes  drastically  as  one 
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Figure  19:  Average  load  vs  time  for  job  arrival  rates  a)P=0.15  and  b)P=0.2 

increase  the  job  arrival  rate,  (or  decreases  Cmin).  In  this  case,  the  performance 
of  the  Random  Selection  algorithm  is  limited  due  to  “bottlenecks” .  Because  the 
randomizing  agents  choose  the  resources  without  considering  their  capacities, 
for  sufficiently  high  loads  the  queues  on  the  resources  with  small  capacities  will 
grow  indefinitely.  This  is  observed  in  19b). 

Clearly,  the  RL  algorithm  allows  the  agents  to  distribute  jobs  among  the  re¬ 
sources  much  more  efficiently  than  the  Random  selection  rule.  More  remarkably, 
we  find  out  that  for  some  parameter  settings  it  performs  quite  well  compared 
to  the  Least  Loaded  algorithm,  as  it  is  illustrated  in  Fig  20  where  we  plot  the 
time  evolution  of  the  average  job  wait  time.  After  a  short  transient  (learning) 
time  the  average  wait  for  the  RL  selection  rule  falls  well  below  wait  time  for 
the  Least  Loaded.  Thus,  although  the  agents  do  not  exchange  information  nor 
have  any  global  knowledge  on  the  current  load  levels  in  the  system,  the  learning 
mechanism  allows  them  to  efficiently  distribute  jobs  among  the  resources. 

4.4.1  Effect  of  dynamic  agent  population 

As  we  mentioned  in  the  Introduction,  it  is  envisioned  that  the  users  associated 
with  a  VO  might  join  and  leave  the  system  dynamically.  Hence,  it  is  important 
to  understand  what  is  the  effect  of  this  dynamics  on  the  resource  allocation 
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Figure  20:  Average  wait  time  for  Least  Loaded  and  RL  selection  rules  (P  =  0.15) 
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Figure  21:  Average  load  vs  time  for  different  leaving  probabilities  Pl  (P  =  0.15) 


mechanism.  We  addresses  this  in  our  simulations  by  assuming  that  at  each  time 
step  each  agent  has  a  non-zero  probability  Pl  of  leaving  the  system.  For  each 
agent  that  leaves,  we  add  a  new  agent,  that  has  to  start  the  learning  procedure 
“from  scratch.”  As  one  should  expect,  for  small  values  of  the  leaving  probability 
Pl  the  impact  of  the  dynamics  is  negligible.  In  other  words,  introducing  small 
number  of  new  agents  into  the  system  does  not  affect  the  behavior  of  the  others 
significantly.  If,  on  the  other  hand,  one  increases  the  leaving  probability  Pl, 
the  situation  becomes  different:  The  intrusion  of  large  number  of  unlearned, 
and  hence  exploring,  agents,  deteriorates  the  system  performance  as  illustrated 
in  Figure  21,  where  the  average  load  vs  time  is  plotted  for  different  values  of 
leaving  probability  Pl- 
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4.4.2  Significance  of  Results 

The  benefit  we  have  observed  for  the  RL  algorithm  over  random  selection  already 
suggests  an  improvement  over  existing  Grid  metascheduling  strategies,  many  of 
which,  while  performing  substantial  planning  of  job  sequences  etc.,  make  ran¬ 
dom  or  otherwise  uniform  distribution  decisions  to  spread  work  among  several 
(or  many)  large-scale  resources  [3].  Even  when  metaschedulers  attempt  to  use 
environmental  information,  such  as  load  levels,  our  results  suggest  that  the 
RL  algorithm  can  provide  better  adaptive  behavior  because  each  metascheduler 
would  learn  from  the  environments  responses  to  its  own  queries.  Divergence  of 
the  agent’s  experience  from  the  resource  characteristics  published  through  the 
monitoring  system  will  change  the  agent’s  job  distribution.  Such  divergence 
can  happen  due  to  monitoring  system  errors,  or  more  likely  due  to  differences 
in  access  privilege  or  priority  between  the  reporting  entities  in  the  monitoring 
system  and  the  metascheduling  agent. 

A  factor  in  real  Grid  resource  allocation  is  the  latency  and  limited  quality 
of  job  status  information.  Our  simulations  suggest  that  the  RL  algorithm  can 
cope  with  stochastic  reward  information,  and  it  might  not  matter  significantly 
whether  the  noise  in  the  reward  information  is  due  to  variance  in  actual  resource 
behavior  or  in  reporting.  However,  statistically  biased  reporting  information 
from  resource  providers  could  lead  to  poor  agent  behavior.  The  delay  in  reward 
information,  e.g.,  from  learning  after  job  completion  instead  of  at  job  submission 
time,  will  length  the  training  period.  With  bursty  job  arrival,  an  agent  may 
perform  worse  during  initial  training  or  during  adaptation  than  if  it  is  able  to 
learn  from  reward  information  immediately  while  processing  a  cluster  of  jobs  in 
a  short  period. 


5  Collective  Mind  Initiative 

The  overall  goal  of  the  Collective  Mind  Initiative  is  to  show  that 

•  Improved  Equipment  Performance, 

•  Weapons  Effectiveness,  and 

•  Mission  Critical  Readiness 

can  come  from  amassing  and  sharing  collective  knowledge  derived  from  the 
community  of  equipment  via  on-board  information  sharing  that  embodies  the 
functions  and  utility  of  agents.  The  knowledge  found  from  a  fleet  of  equipment 
is  to  be  used  to  improve  the  overall  performance  of  the  fleet,  each  single  piece 
of  equipment  in  the  fleet  and  all  equipment  with  similar  components. 

Example  1  Commercial:  examination  of  the  logistics  records  from  the  col¬ 
lection  of  all  Honda  Accord  automobiles  shows  that  it  is  wise  to  change  the 
engine-timing  belt  before  70,000  miles  or  risk  failure  soon  thereafter. 

Example  2  Military:  examination  of  many  failures  of  a  military  piece  of 
equipment  showed  that  a  specific  type  of  high  tensile  bolt  was  made  from  non- 
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MilSpec  poor  quality  metal.  All  uses  of  the  bolt  in  all  equipment  types  were 
inspected  for  compliance  and  replacement  if  required. 

Finding  solutions  to  these  examples  is  at  present,  human  intensive.  Our 
goal  is  to  make  it  automatic  by  the  development  of  new  Artificial  Intelligence 
Information  Intensive  techniques  that  fit  the  new  DOD  concepts  of  Net-Centric 
Warfare  and  apply  over  all  of  the  Services. 

We  proposed  to  address  the  opportunity  by  the  development  of  an  innova¬ 
tive  technology:  the  Collective  Mind  with  Collective  Learning  and  Collective 
Reasoning  forming  Collective  Intelligence. 

The  approaches  we  propose  to  address  these  challenges  of  Collective  Mind 
capitalize  upon  the  existing  data  in  the  form  of  designed  engineering  models  and 
existing  field  data;  exploit  the  structure  of  platforms  (equipment)  as  a  network  of 
interacting  subsystems;  and  exploit  the  heterogeneous  experience  of  the  various 
platforms  (equipment) .  The  knowledge  produced  as  a  result  of  these  efforts  will 
then  be  used  to  improve  maintenance  procedures  in  general,  provide  focused  help 
to  the  individual  maintainer  and  will  be  integrated  into  sophisticated  reasoning 
systems  for  planning  and  scheduling  -  and  for  determining  mission  modifications 
that  could  be  proposed  to  improve  mission  operations. 

The  technical  challenges  are  as  follows: 

1.  Diagnosis  and  prognosis  of  individual  components,  subsystems  and  entire 
platforms; 

2.  Planning  and  scheduling  of  logistics,  including  maintenance,  and  mission 
activities  to  ensure  mission  success,  and 

3.  Proposing  changes  in  mission  operations  to  ensure  mission  success.  The 
key  technical  areas  proposed  for  the  work  here  are: 

•  Reasoning:  using  substantial,  appropriately  represented  knowledge, 

•  Learning:  from  experience  so  that  the  system  performance  improves 

•  Explanation  of  actions  and  recommendations,  and 

•  Robust  Response  to  surprise  and  contingencies. 

A  collective  of  units  can  be  made  from  data  from  many  units  in  the  field. 
Structural  and  statistically  based  algorithms  would  match  successful  and  failed 
performance  to  maintenance  procedures,  equipment  status  and  environmental 
conditions  to  identify  or  learn  better  ways  to  maintain  equipment. 

The  scientific  issues  underlying  this  work  are  individualization  and  emer¬ 
gent  behavior,  model  compositionality  especially  the  composing  of  multimodal 
learning  systems,  with  different  strengths  in  different  conditions. 

We  ran  several  workshops  on  the  Collective  Minds  as  well  as  holding  many 
smaller  meetings.  The  first  workshop  was  attended  by  potential  users,  military 
and  industrial,  and  the  second  and  third  workshops  were  attended  mainly  by 
researchers  from  academia  and  Industry. 

The  Research  Agenda  reflect  both  our  opinions  as  well  as  those  of  the  many 
workshop  attendees. 
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5.1  Background 

Our  definition  of  the  Collective  Mind  has  three  parts,  Cognition,  Action  and 
Learning.  Collective  Cognition  is  the  ’’sum”  of  all  knowledge  obtained  from  the 
collection  of  all  assets  in  a  group  and/or  similar  groups.  Collective-based  Action 
comes  from  deductions  from  the  Collective  Mind  resulting  in  a  set  of  feasible 
actions  that  encompass  group  knowledge.  Collective  Learning  is  the  ability  to 
learn  from  the  collective,  mind  and  actions.  This  is  especially  valuable  as  the 
individual  assets  operate  in  real  time  in  their  individual  missions. 

The  application  focus  of  our  work  on  Collective  Intelligence  is  on  Mission 
Critical  Readiness  of  Military  Equipment.  The  details  of  that  domain  moti¬ 
vate  the  working  definitions  of  Collective  Mind  and  serve  to  give  metrics  on  its 
efficacy. 

We  postulate  a  scenario  in  which  a  mission  is  proposed  for  the  asset;  assume 
that  the  asset  has  not  been  asked  to  undertake  the  mission  in  the  past,  or  if  it 
has,  the  proposed  mission  may  be  in  a  new  environment. 

The  technical  challenge  is  to  improve  the  mission  readiness  by  ensuring  that 
each  and  every  asset  (component,  equipment,  platform,  etc.)  employed  in  the 
mission  is  capable  of  performing  the  operations  specified  to  achieve  the  objec¬ 
tives  of  the  mission.  This  involves  not  only  the  selection  of  assets,  but  also  their 
preparation  for  the  mission  and  even  their  maintenance  during  operations.  The 
first  step  is  to  search  and  see  if  there  are  any  assets  configured  like  that  needed 
for  the  mission  that  have  performed  similar  mission  in  a  similar  environment 
-  and  have  a  similar  history.  Here  the  collective  is  the  set  of  all  assets  that 
are  the  same  as  the  set  needed  for  the  mission.  We  therefore  need  for  every 
asset  a  ’’model”  that  predicts  its  performance  based  upon  the  condition  of  its 
equipment,  its  past  performance  and  the  conditions  expected  during  the  ensu¬ 
ing  mission.  More  specifically,  we  need  the  vector  of  performance  variables  that 
are  some  function  of  the  attributes  that  represent  the  condition  of  the  asset  - 
perhaps  in  terms  of  its  components.  The  condition  of  the  equipment  is  in  turn 
a  function  of  its  operation  in  the  past  (the  missions  it  has  performed  including 
the  environment  it  has  performed  in)  and  its  maintenance  history. 

The  Collective  Mind  for  this  case  is  the  knowledge  from  the  collective  set 
of  all  assets  as  well  as  the  ability  to  learn  from  the  collective  as  these  assets 
continue  to  operate  in  the  missions. 

If  we  consider  each  asset  (component,  equipment,  platform)  as  an  agent  with 
its  own  reasoning  and  learning  capability,  the  Collective  Mind  is  a  system  of 
interrelated  actions  by  these  agents.  The  actions  are  purposeful  and  the  agents 
are  attentive  to  the  actions  of  other  agents.  The  agents  construct  their  actions, 
understand  that  the  system  consists  of  connected  action  by  themselves  and 
others,  and  interrelate  their  actions  within  the  system.  In  order  to  accomplish 
the  foregoing,  actions  by  the  agents  must  be  contributions  to  the  goals  of  the 
system;  there  must  be  a  common  representation  for  each  agent  to  understand  the 
actions  of  others  and  the  results  of  those  actions;  and  the  system  must  recognize 
the  need  for  an  agent  to  subordinate  its  actions  to  those  of  the  system.  In  this 
conceptualization,  the  actions  are  really  the  mental  processes  of  the  collective 
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mind.  The  Collective  Mind  is  in  how  the  agents  contribute  and  represent  all 
actions,  and  produce  improved  group  behavior 

These  actions  can  be  categorized  as  collective  learning  and  collective  reason¬ 
ing  (recognizing  that  in  some  cases  the  distinction  may  b  somewhat  arbitrary). 

A  question  is:  ...  does  the  Collective  achieve  the  desired  behavior? 

Formally, 

a)  Let  fi(xi)  be  the  function  (or  task)  to  be  optimized  by  an  individual  agent 
A,;,  where  i  =  0, . . . ,  N  and  N  is  the  number  of  individuals  in  the  collective,  and 
x,j  is  the  parameter  of  A,. 

b)  Let  G(/i(a:i),  /2(2;2), . . . ,  fi(xi), . . . ,  Jn(xn))  be  the  function  that  must 
be  optimized  by  the  entire  collective. 

We  are  interested  in  the  characteristics  of  G.  Furthermore,  when  G  is  given, 
can  the  system  automatically  determine  fi(xi)  for  the  individuals?  In  the  sim¬ 
plest  case,  when  G  is  an  additive  (sum)  function,  then  every  individual  should 
simply  maximize  fi(xi)  so  that  G  will  be  maximized.  However,  in  real  world 
applications,  G  can  be  much  more  complex,  and  some  individuals  must  ’’sac¬ 
rifice”  themselves  (i.e.,  minimize  their  own  fi(xi))  in  order  to  maximize  the 
value  of  G  in  the  collective  situations.  Ideally,  given  a  new  application  G,  the 
’’collective”  should  be  able  to  automatically  generate  fi(xi)  for  each  individ¬ 
ual.  An  example  is  the  voting  game,  where  individuals  vote  ”yes”  and  ”no”, 
but  their  reward  depends  on  the  percentage  of  yes  votes  of  the  people  of  whole 
population.  This  global  reward  function  is  collected  by  a  ’’Referee”  and  is  not 
known  by  the  individuals  [45] .  Similarly,  if  we  make  certain  assumptions  on  the 
relationship  between  individuals  (such  as  they  act  as  constraints),  then  there 
are  some  studies  that  can  perform  distributed  optimization  when  G  is  given  and 
fixed  [33]. 

Collective  Learning  In  general,  the  collective  learns  by: 

•  Collective  experience  -  relates  knowledge  gained  from  new  experience  to 
prior  learning; 

•  Collective  example  -  relate  to  events  or  objects  via  problem  solving  simu¬ 
lations;  and 

•  Collective  discovery,  i.e.,  improvisation  Also,  learning  takes  place  in  (at 
least)  two  dimensions  -  time  and  space. 

Machine  learning  is  traditionally  for  single  agents.  Recently,  there  is  the 
new  trend  of  multi-agent  learning.  However,  the  topic  of  ’’collective  mind” 
has  several  unique  features  that  demand  a  new  paradigm  for  machine  learning 
that  we  call  ’’Collective  Learning”.  The  new  learning  problems  include:  ”  How 
do  individuals  learn  the  structure  (some  call  it  topology)  of  the  organization 
dynamically?  Existing  approaches  such  as  hidden  Markov  models  or  Bayesian 
Networks  are  mostly  about  learning  parameters  and  they  avoid  this  structural 
learning  problem  because  it  is  too  hard.  ”  How  do  individual  agents  learn  a 
model  of  the  environment  and  the  same  time  a  model  of  other  individuals? 
Traditionally,  these  two  modeling  activities  are  fixed  together,  and  most  people 
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claim  one  would  subsume  the  other.  In  the  Collective  Mind,  these  two  may  be 
related,  but  they  definitely  have  different  characteristics  and  require  different 
learning  techniques. 

The  Collective  Mind  envisions  integrating  domain  knowledge  from  many 
sources  with  real-time  data  feeds  from  deployed  platforms  to  support  rapid 
problems  identification  and  response.  Sources  of  domain  knowledge  include  the 
following: 

•  The  ’’anatomy”  of  each  platform  (What  are  the  components  and  subsys¬ 
tems?  How  are  they  physically  located?  How  are  they  connected?), 

•  The  ’’physiology”  of  each  platform  (How  does  each  subcomponent  of  the 
platform  contribute  to  the  overall  functioning  of  the  platform?  Typically, 
this  is  divided  into  separate  models  for  each  subsystem.) 

•  The  maintenance  history  of  each  component  (When  manufactured.  His¬ 
tory  of  maintenance  actions.  Results  of  previous  tests.) 

•  The  deployment  history  of  each  platform  (What  missions  has  it  partic¬ 
ipated  in?  Where?  Under  what  environmental  condition?  How  long 
mothballed?  Where?) 

The  real-time  data  feeds  include  the  following: 

•  On-board  sensors  on  each  vehicle 

•  Maintenance  events 

•  Debrief  from  crew  after  each  mission  (or  each  shift?) 

Collective  learning  involves  a  distributed  set  of  learning  platforms  that  must 
learn  continually  but  that  only  occasionally  have  opportunities  to  communicate 
with  each  other.  Under  such  conditions,  it  is  not  feasible  to  pool  all  of  the  sensor 
readings  from  all  of  the  platforms  in  real-time.  Instead,  each  platform  must 
form  its  own  hypotheses  and  then,  when  the  opportunity  arises,  communicate 
its  hypotheses  to  the  other  platforms  (along  with  the  key  supporting  data  and 
observations) . 

Existing  learning  methods  do  not  have  good  ways  of  making  use  of  domain 
knowledge.  Existing  learning  methods  are  designed  for  off-line  batch  training 
(e.g.,  constructing  an  optical  character  recognizer  by  training  on  a  database 
of  1  million  labeled  hand- written  characters).  Existing  learning  methods  are 
designed  for  learning  from  a  single  combined  database,  rather  than  by  combining 
hypotheses  from  many  other  learning  agents. 

Making  Learning  Knowledge-Guided:  Domain  knowledge  can  guide  learning 
in  two  ways.  First,  it  can  suggest  the  space  of  hypotheses  to  consider.  For  exam¬ 
ple,  a  learning  system  that  only  had  sensor  readings  must  learn  to  relate  overall 
platform  failure  directly  to  the  history  of  sensor  readings.  It  might  explain  a 
platform  failure  in  terms  of  the  accumulation  of  several  episodes  of  operation  in 
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high  ambient  temperatures.  But  a  knowledge-based  learning  system  could  ex¬ 
plain  the  platform  failure  in  terms  of  failure  of  the  engine  caused  by  the  added 
load  placed  on  the  engine  by  the  air  conditioning  system  resulting  from  the 
crew  needing  more  air  conditioning  to  operate  successfully  in  high  ambient  tem¬ 
peratures.  A  knowledge-based  learner  can  relate  sensor  readings  to  individual 
subsystems  and  then  explain  overall  platform  failure  in  terms  of  the  failure  of 
certain  subsystems. 

Second,  domain  knowledge  can  constrain  the  space  of  possible  explanations. 
If  we  consider  the  space  of  all  mappings  from  raw  sensor  readings  to  platform 
failures,  this  is  an  immense  space.  Rapid  learning  from  small  amounts  of  data 
requires  that  the  space  of  mappings  be  highly  constrained.  Domain  knowledge 
can  constrain  the  space  by  recasting  it  in  terms  of  components  and  subsystems 
rather  than  just  raw  sensor  readings.  It  can  also  suggest  the  direction  of  possible 
effects.  For  example,  operating  an  engine  at  higher  RPMs  tends  to  reduce  engine 
life;  operating  an  engine  at  extreme  temperatures  tends  to  reduce  engine  life,  etc. 
Without  this  kind  of  background  knowledge,  a  learning  system  would  need  to 
consider  (and  reject)  the  hypothesis  that  lower  RPMs  and  normal  temperatures 
reduce  engine  life! 

Existing  research  in  Inductive  Logic  Programming  and  Probabilistic  Rela¬ 
tional  Models  shows  how  to  use  domain  knowledge  to  define  the  space  of  possible 
hypotheses.  However,  these  methods  have  not  been  scaled  up  to  large  problems 
or  to  problems  involving  very  noisy,  sensor-based  data. 

There  is  only  a  small  amount  of  research  showing  how  domain  knowledge  can 
constrain  the  space  of  possible  hypotheses  considered  by  the  learning  system. 
This  research  is  largely  ad  hoc.  Substantial  work  is  needed  to  develop  good 
modeling  languages  for  describing  the  domain  knowledge  and  good  ways  of 
converting  the  domain  knowledge  into  constraints  on  the  hypothesis  space. 

Making  Learning  Real-Time:  There  are  two  challenges  to  creating  real-time 
learning  systems.  The  first  challenge  is  to  design  online  versions  of  existing 
learning  algorithms.  There  is  a  lot  of  existing  work  on  online  (real-time)  algo¬ 
rithms  for  training  neural  networks,  linear  threshold  units,  and  decision  trees. 
Most  batch  search  and  optimization  algorithms  can  be  converted  into  online  al¬ 
gorithms  in  principle.  The  challenge  is  to  find  practical,  efficient  online  versions 
of  these  methods. 

The  second  challenge  is  to  make  those  online  algorithms  adaptive,  by  which 
we  mean  that  they  can  deal  with  changing  worlds  in  which  new  kinds  of  failures 
occur,  new  kinds  of  sensors  become  available,  old  sensors  cease  to  be  available, 
and  the  probabilities  of  different  faults  change  because  of  changes  in  missions 
and  the  ways  that  platforms  are  being  used  in  missions. 

Existing  research  in  expert-weighting  and  portfolio  algorithms  have  been 
shown  (theoretically)  to  adapt  rapidly  to  changes  in  the  phenomena  being  pre¬ 
dicted  [47].  A  DARPA  program  could  transition  this  work  into  real-world  sys¬ 
tems  and  show  how  it  can  be  applied  in  noisy  real-time  settings. 

Making  Learning  Collective:  The  challenge  for  collective  learning  is  for  mul¬ 
tiple  learning  agents  to  pool  their  learned  knowledge  without  pooling  all  of  their 
sensor  data.  Existing  research  suggests  the  following  directions  to  pursue: 
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•  First,  research  on  ensemble  learning  methods  learns  to  take  a  weighted 
vote  of  the  hypotheses  learned  by  each  individual  learning  agent  [22] .  This 
has  been  applied,  for  example,  to  solve  very  large  learning  problems  by  ran¬ 
domly  dividing  the  available  data  into  subsets,  assigning  a  separate  agent 
to  learn  on  each  subset,  and  then  voting  the  resulting  learned  hypothe¬ 
ses.  This  has  been  shown  to  give  results  comparable  to  those  obtained  by 
training  a  single  system  on  the  entire  data  set  [38] . 

•  Second,  research  in  support  vector  machines  (and  related  algorithms) 
shows  how  to  identify  the  key  data  points  that  support  an  hypotheses 
[6].  These  points  are  known  as  the  ’’support  vectors”,  and  they  are  suf¬ 
ficient  to  reconstruct  the  hypothesis  perfectly.  An  interesting  direction 
would  be  for  the  multiple  agents  to  exchange  their  support  vectors  and 
then  use  all  of  these  support  vectors  in  learning. 

5.2  Collective  Reasoning  (includes  planning  and  schedul¬ 
ing) 

In  collective  mind,  since  knowledge  and  information  are  distributed  among  many 
individuals,  it  makes  reasoning  and  planning/scheduling  much  harder.  One 
unique  advantage  this  will  offer  is  that  damage  to  any  individuals  will  not  para¬ 
lyze  the  entire  organization.  The  individuals  should  know  where  to  ask  for  and 
deliver  information,  know  how  to  recover  information  when  some  nodes  die, 
know  what  to  communicate  among  themselves  in  order  to  make  a  good  plan, 
and  know  how  to  evaluate  a  new  plan/schedule  collectively. 

Another  aspect  of  Collective  Reasoning  is  how  to  divide  a  global  task  into 
a  ’’workflow”  of  smaller  tasks  so  that  each  small  task  can  be  performed  by 
some  individuals,  and  when  those  small  tasks  are  finished,  the  results  should  be 
assembled  in  such  a  way  that  a  global  solution  can  be  readily  obtained.  This  is 
the  divide  and  conquer  problem  and  typically  the  ’’Task  Allocation”  problem. 

An  important  opportunity  for  research  is  to  integrate  learned  knowledge  into 
sophisticated  reasoning  systems.  The  Collective  Mind  requires  this,  because 
the  results  of  individual  component  and  vehicle  prognoses  must  be  used  by  the 
mission  +  maintenance  scheduler  to  decide  when  and  how  to  schedule  platforms 
for  missions  and  for  maintenance.  Uncertainty  in  prognoses  must  be  translated 
into  uncertainty  in  mission  success  and/or  the  need  for  maintenance. 

•  There  are  at  least  two  approaches  to  incorporating  uncertain  predictions 
into  complex  reasoning: 

•  propagation  of  uncertainty  and  ensembles. 

Propagation  of  uncertainty  computes  a  posterior  distribution  over  random 
variables  of  interest  (e.g.,  mission  success,  expected  equipment  losses)  based  on 
the  distributions  of  other  random  variables  (i.e. ,  the  prognostic  predictions). 

Ensemble  methods  construct  a  set  of  non-stochastic  ”  alternative  scenarios” 
or  alternative  models  and  compute  schedules  based  on  each  scenario.  The  re¬ 
sulting  schedules  are  then  analyzed  to  identify  consensus  scheduling  decisions 
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and/or  ways  of  modifying  the  schedule  so  that  it  will  succeed  under  all  scenarios. 
Ensemble  methods  have  been  very  popular  in  classification  learning,  but  there 
has  been  little  research  on  ensembles  for  reasoning. 

Another  research  opportunity  is  to  determine  and  propose  modifications  to 
missions  both  in  planning  and  scheduling  of  operations.  The  space  of  potential 
mission  modifications  is  huge,  so  some  way  is  needed  to  constrain  the  Collec¬ 
tive  Mind  from  proposing  ridiculous  modifications.  This  can  be  viewed  as  the 
problem  of  reasoning  about  the  ’’utility  function”  of  the  commander.  We  can 
imagine  that  several  tradeoffs  are  operating  in  a  battle  theater:  (a)  mission 
goals,  (b)  safety  of  troops,  (c)  reliability  of  supply,  (d)  loss  of  equipment,  (d) 
speed.  There  has  been  recent  work  on  inferring  multi-attribute  utility  functions 
by  observing  the  choices  made  by  humans,  (e.g.,  in  auctions  or  in  video  games). 
We  would  also  like  to  develop  systems  that  are  instructable,  so  that  commanding 
officer  can  say  ’’Timing  is  critical;  don’t  propose  any  modifications  that  delay 
the  mission.”  This  learning  should  begin  during  exercises  and  continue  into  the 
battlefield.  Once  the  utility  function  has  been  acquired,  the  scheduler  can  gener¬ 
ate  and  evaluate  alternative  mission  modifications  and  choose  the  modifications 
and  choose  the  modification  that  has  the  highest  utility. 

5.3  Collective  Behavior 

A  collective  should  yield  an  emergent  whole  that  is  qualitatively  more  than  the 
sum  of  the  parts.  Its  functionality  ought  not  to  be  something  that  one  of  the 
parts  could  do  by  itself  if  only  it  were  bigger.  Any  such  system  faces  a  dual 
challenge. 

•  How  does  the  behavior  of  the  individual  elements  yield  the  emergent  be¬ 
havior  of  the  whole?  (Example  from  our  case:  how  does  local  awareness 
of  a  platform’s  own  state  roll  up  into  a  global  assessment  if  the  state  of 
readiness  of  the  fleet? 

•  How  can  the  global  functionality  be  applied  to  the  problem,  given  that 
the  individual  elements  are  the  only  sensors  and  effectors  that  the  system 
has?  (Example:  if  the  system  learns  that  alternators  with  exposure  to 
high  temperature  and  high  humidity  have  unusually  high  failure  rates, 
how  does  that  knowledge  affect  local  decisions  at  the  company  platoon 
level? 

This  issue  is  at  the  heart  of  the  need  for  compositionality,  which  may  be  the 
critical  issue  at  the  heart  of  anything.  What  makes  compositionality  difficult  is 
that  both  individual  behaviors  of  the  piece  and  their  interactions  are  typically 
nonlinear.  If  one  adopts  a  centralized  approach,  these  can  be  addressed  rela¬ 
tively  simply,  but  such  a  solution  does  not  scale  well  and  is  not  robust  against 
attack  or  acts  of  God. 

Potential  ways  for  addressing  these  issues  draw  heavily  on  simulation  and 
concepts  from  statistical  mechanics  as  a  body  of  knowledge  about  how  global 
properties  emerge  from  locally  interacting  entities. 
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These  approaches  are  also  relevant  when  we  assume  that  the  computational 
and  communications  environment  may  be  constrained  -  just  as  in  the  field. 

The  major  challenge  is  to  determine  how  the  required  learning  and  reasoning 
functions  can  be  achieved  under  such  constraints,  providing  graceful  degradation 
(rather  than  catastrophic  failure)  as  communications  and  computational  power 
are  incrementally  degraded.  Conventional  learning  and  reasoning  algorithms 
do  not  decompose  neatly  onto  such  an  architecture,  or  at  least  they  have  not 
been  shown  to  be  decomposable  this  way.  Swarming  approaches,  by  contrast 
are  ideally  suited  to  such  architectures,  because  of  three  of  their  characteristics: 

•  The  individual  processes  are  small  compared  with  the  overall  system,  so 
they  can  easily  run  on  cycles  scavenged  from  other  applications  on  em¬ 
bedded  processors; 

•  Each  process  interacts  only  with  others  that  are  co-located  with  it  in  some 
topology.  The  best  fit  comes  when  this  topology  is  isomorphic  with  the 
physical  distribution  of  the  platforms,  but  even  if  it  is  not,  it  does  provide 
a  way  to  limit  the  interactions  among  processes  and  thus  function  in  an 
environment  with  bounded  communications. 

•  Their  emergent  dynamics  are  robust  to  incremental  changes.  They  tend  to 
degrade  gracefully  over  wide  parameter  ranges.  (There  are,  naturally,  lim¬ 
its  beyond  which  they  cannot  function,  and  characterizing  these  is  critical 
to  a  program  in  this  area,  but  they  offer  a  much  broader  range  of  oper¬ 
ability  than  do  conventional  mechanisms.) 

Integration  of  Monitoring/Diagnosis  with  Scheduling  and  Planning:  The 
modular  vision  of  ”  first  we  assess  the  state  of  our  platforms,  then  we  plan  the 
mission”  is  unrealistic  in  a  highly  dynamic  environment  in  which  platform  state 
and  mission  constraints  change  constantly.  We  need  new  mechanisms  that  can 
incrementally  learn  and  plan  in  tandem.  Such  mechanisms  must  have  the  ”  any¬ 
time”  characteristic:  they  quickly  produce  an  approximate  answer,  and  can  give 
more  detail  if  more  time  and  resources  are  available  to  them.  In  the  context  of 
closely  coupled  learning  and  planning,  often  an  early  approximation  to  one  half 
of  the  problem  (say,  learning)  can  help  constrain  the  space  that  the  planning 
function  must  search,  and  the  planner’s  early  results  can  in  turn  help  focus  the 
learner,  leading  to  more  rapid  convergence  than  would  be  possible  in  a  sequential 
system. 

Swarming  algorithms  are  typically  any-time,  and  have  been  demonstrated 
in  both  classification  and  planning  tasks. 

Environmental  Integration:  The  issue  here  is  discriminating  between  two 
possible  interpretations  of  an  aberrant  sensor  reading:  system  malfunction  (the 
system  is  out  of  specs)  vs.  environmental  or  historical  stress  (the  environment 
is  out  of  spec).  No  matter  how  complete  our  models  of  our  systems  may  be, 
complex  electromechanical  systems  will  always  have  emergent  properties  that 
surprise  us.  We  need  ways  to  use  other  platforms  that  share  the  same  situation 
as  an  implicit  engineering  model  to  distinguish  (local)  equipment  failure  from 


56 


(shared)  environmental  stress.  More  generally,  we  need  to  compare  behaviors 
across  platforms  that  may  not  be  co-situated  right  now,  but  that  are  near  to 
one  another  in  the  space  of  shared  histories. 

This  challenge  relies  directly  on  the  notion  of  proximity  among  platforms  in 
some  topology  (physical  space-time;  history  space),  and  so  lends  itself  naturally 
to  swarming  methods,  one  of  whose  hallmarks  is  locality  of  interaction. 

5.4  Improvisation 

’’Making  do”  and  ’’taking  the  initiative”  are  desirable  actions  of  military  per¬ 
sonnel,  tasked  with  supporting  battlefield  operations. 

Improvisation  involves  reworking  knowledge  in  time  to  meet  the  require¬ 
ments  of  a  given  situation.  Reworking  refers  to  revising  or  abandoning  planned- 
for  procedures.  Time  is  central  to  improvisation  since  the  improviser’s  decision 
cannot  be  undone  once  it  is  done.  Finally,  meeting  requirements  means  ac¬ 
counting  for  constraints  in  the  decision  setting  while  acting  to  meet  the  goals 
of  the  response.  The  question  of  when  to  improvise  involves  recognizing  when 
planned- for  procedures  cannot  or  should  not  be  applied.  In  problem-solving 
terms,  it  may  therefore  be  conceptualized  as  a  categorization  problem,  in  which 
the  ability  of  likelihood  of  a  decision  maker  to  categorize  correctly  is  influenced 
by  a  number  of  factors,  such  as  penalties  associated  with  making  an  incorrect 
choice.  The  question  of  how  to  improvise  involves  developing  and  deploying 
new  procedures  in  real-time.  It  may  be  conceptualized  as  a  search  and  assem¬ 
bly  problem,  influenced  by  factors  such  as  time  available  for  planning,  risk  in 
the  environment  and  the  results  of  prior  decisions. 

Collective  improvisation  is  an  approach  to  supporting  battlefield  personnel 
when  the  need  to  develop  and  deploy  new  procedures  arises.  In  collective  im¬ 
provisation,  past  knowledge  -  which  may  be  contained  in  databases  such  as 
ontologies  and  may  be  operated  upon  by  decision  logics  -  is  re-examined  and 
reorganized  in  order  to  meet  new  requirements.  Results  of  these  improvisations 
are  then  fed  back  into  the  system,  thereby  completing  the  learning  loop. 

Related  prior  work  lays  the  foundation  for  collective  improvisation.  Hayes- 
Roth  and  colleagues  have  developed  a  series  of  blackboard-style  architectures 
to  support  and  in  some  cases  model  improvisation.  Models  built  upon  these 
architectures  are  enabled  with  dynamic  control,  thereby  allowing  execution  of 
real-time  control  plans  which  specify  a  sequence  of  tasks,  parameter  values  and 
constraints.  In  order  to  support  improvisation  and  capture  the  learning  in¬ 
volved,  the  blackboard  or  any  architecture  must  have  access  to  and  understand¬ 
ing  of  models  of  the  physical  systems  of  the  platforms  involved  in  battlefield 
operations. 

5.5  Collective  Mind  Experiment  and  Prototype 

Collective  Mind  may  be  defined  as  using  collective  learning  and  collective  rea¬ 
soning  to  produce  a  desired  outcome,  and  thus  to  use  Collective  Knowledge  of 
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the  fleet  to  improve  any  and  all  individual  platforms. The  Collective  Mind  con¬ 
cept  was  studied,  as  described  above,  by  several  Workshop  Groups  comprised 
of  experts  from  both  the  Academic  and  Military  communities.  The  technical 
opportunities  excited  the  Academicians  and  the  practical  opportunities  excited 
the  Military.  It  became  clear  that  a  proof  of  concept  experiment,  if  it  could 
be  performed  within  the  Study,  would  serve  to  crystallize  the  thinking  of  both 
Communities  and  show  immediate  tangible  results  to  inspire  both.  A  good 
experiment  necessarily  requires  a  good  corpus  of  experimental  data.  While  mil¬ 
itary  data  exists,  the  best  and  most  accessible  corpus  of  data  was  found  in 
General  Electric  Company  in  their  Locomotive  Division’s  contract  Maintenance 
Operation.  GE  Locomotives  are  sophisticated  electromechanical  machines  that 
contain  big  diesel  engines,  have  to  operate  in  all  climates,  all  weathers  and 
terrains  and  move  heavy  equipment.  They  can  be  considered  therefore  as  rea¬ 
sonable  surrogates  to  military  vehicles  such  as  the  Stryker  and  also,  in  a  sense, 
to  big  armored  tanks.  GE  locomotives  are  on-line  via  their  on-board  satellite 
dishes.  The  captured  data  is  processed  in  real  time  to  satisfy  the  needs  of  GE’s 
railroad  customers  who  require  financial  assurances  that  freight  will  move  from 
A  to  B  reliably  and  on-time  or  else  a  fee  has  to  be  paid  in  compensation.  This 
problem  is  similar  to  the  Military  field  commander  who  needs  to  move  equip¬ 
ment  from  A  to  B  reliably  and  on  time  for  military  reasons  and  with  a  different, 
even  more  severe  pay-off  function. 

We  selected  Improving  Mission  Reliability,  a  component  of  Mission  Opera¬ 
tions,  as  the  objective  for  the  ’’Proof  of  Concept”  experiment.  Mission  reliability 
was  broadly  defined  as  -  given  a  mission  of  duration  X-days,  what  percentage  of 
units  assigned  to  that  mission  are  able  to  complete  the  mission  without  a  criti¬ 
cal  failure.  The  motivation  for  high  mission  reliability  in  both  commercial  and 
DOD  environments  is  two-fold.  First,  it  gets  the  mission  performed;  second,  it 
makes  mission  planning  and  execution  more  predictable  and  effective;  third,  it 
reduces  the  logistics  footprint  required  to  support  a  certain  level  of  readiness. 
In  the  military  domain,  this  may  mean  picking  the  best  5  vehicles  to  conduct 
a  reconnaissance  mission  in  swampy  terrain;  in  the  commercial  sector,  it  may 
imply  selecting  the  best  5  locomotives  to  deliver  time-critical  shipments  from 
coast  to  cost.  This  problem  is  accentuated  in  the  case  of  new  mission  types  or 
new  equipment  platforms  when  insufficient  data  exists  on  how  the  equipment 
will  behave  in  that  environment. 

The  paradigm  for  new  platforms  focuses  on  the  continuum  and  tension  be¬ 
tween  engineering  test  cell  projections  made  before  deployment,  and  retrospec¬ 
tive  statistical  measurement  of  performance  found  or  measured  after  a  substan¬ 
tial  number  of  missions.  During  this  ’gap’  -  the  first  wave  of  missions  on  new 
platforms,  both  the  commander  and  the  maintainer  are  selecting  and  repair¬ 
ing  units  ’blindly’  with  respect  to  their  equipment’s  expected  behavior.  The 
collective  mind  approach  tries  to  compensate  for  the  scarcity  of  operational  ex¬ 
perience  on  any  single  unit  by  learning  from  ground  performance  of  ’peer’  units 
with  current  or  past  similar  deployment  experience. 
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5.5.1  The  Experiments 

The  Study  funded  an  experiment  (in  reality,  a  series  of  experiments)  conducted 
by  GE  Global  Research  that  applied  peer-based  learning  to  predict  time-to- 
failure  performance  in  locomotives.  GE  as  part  of  their  normal  business  keeps 
an  extensive  and  perhaps  unique  data  set  of  field  failures  and  repair  actions  from 
customers  locomotives.  The  data  was  obtained  from  GE  locomotives  owned 
and  operated  by  GR  Rail  and  Union  Pacific.  The  data  is  obtained  from  normal 
computer  control  systems  used  in  the  Locomotive  and  delivered  back  to  GE 
by  a  variety  of  means  including  from  a  satellite  dish  on  each  train.  That  is 
no  special  sensors  were  installed  for  the  normal  course  of  business  or  for  our 
experiments.  The  data  for  our  experiment,  combined  design,  utilization  and 
repair  information  on  1100  locomotives  over  a  2  year  time  period.  Any  individual 
locomotive’s  time-between-failures  appears  chaotic  and  unpredictable.  Caveat: 
the  following  is  a  description  of  the  best  industrial  practice  we  could  find.  GE 
used  extensive  files  and  records  in  their  database. 

This  project  utilized  existing  systems  on  the  locomotives.  No  new  sensors 
were  added  to  collect  these  data.  This  Collective  Mind  approach  capitalized 
upon  existing  data  collection  methods  and  did  not  design  a  priori  new  sensors 
or  other  data  collection  systems. 

The  data  for  the  study  was  collected  from  four  different  sources: 

1.  Locomotive  Design  &  Engineering  Data  from  GE  Rail:  GE  Rail  manu¬ 
factured  the  locomotives  in  this  study.  As  the  OEM,  GE  Rail  possessed 
engineering  data  on  locomotive  models,  configurations,  date  of  manufac¬ 
ture,  date  of  service,  the  date  EOA  service  was  installed,  upgrades,  and 
software  modifications. 

2.  Locomotive  Recommendation  Data  from  GE  Rail  EOATM  remote  mon¬ 
itoring  and  diagnostics  service:  For  each  locomotive,  there  was  a  time- 
stamped  record  of  when  the  Expert  on  Alert  (EOATM)  system  detected 
abnormal  patterns  in  the  fault  data  leading  to  a  recommendation  being 
issued  by  GE  Rail  Locomotive  Services.  A  red  or  yellow  recommendation 
indicated  a  problem  that  was  serious  and  required  a  fix  in  the  next  7-10 
days  at  most. 

3.  Locomotive  Maintenance  Data  from  Repair  Shops:  Each  red  or  yellow  rec¬ 
ommendation  used  in  the  experiments  was  associated  with  maintenance 
feedback  from  railroads  or  GE  repair  shops  which  indicated  the  exact  re¬ 
pair  action  that  successfully  fix  the  problem.  Therefore  the  data  included 
only  maintenance  intervals  where  a  genuine  problem  existed  on  the  loco¬ 
motive  that  was  verified  by  the  maintenance  personnel. 

4.  Locomotive  Utilization  data  from  a  selected  railroad:  Each  locomotive 
maintains  an  on-board  record  of  a  number  of  utilization-related  parame¬ 
ters  that  are  collected  when  a  locomotive  reaches  a  railroad  yard.  These 
parameters  include  odometer  miles,  total  megawatt-hours,  hours  spent 
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motoring,  hours  spent  in  dynamic  braking,  cumulative  engine  hours,  cu¬ 
mulative  engine  hours  moving,  percentage  of  time  spent  in  each  of  the 
eight  notch  settings  (analogous  to  gear  settings)  and  others. 

5.  Diagnostics  are  done  in  GE’s  Diagnostic  center  staffed  by  a  group  of  ex¬ 
tremely  experienced  engineers  who  have  final  decision  power  over  the  ma¬ 
chine  computed  recommendation  produced  via  case-based  reasoning.  The 
diagnosis  and  repair  recommendations  are  then  sent  to  the  local  depot  or 
to  the  of  the  train,  wherever  it  is(known  through  on-board  GPS). 

The  Collective  Mind  Computational  Approach  Peer-based  learning  method¬ 
ologies  were  investigated  since  they  provide  a  transparent,  adaptable  model 
mechanism.  The  particular  approach  taken  was  to  focus  on  the  representation 
and  reasoning  mechanisms  of  instance-based  reasoning.  Instance-based  reason¬ 
ing  (IBR)  relies  on  a  collection  of  previously  experienced  data  that  can  be  kept 
in  their  raw  representation.  Unlike  Case-based  Reasoning  (CBR),  they  do  not 
need  to  be  refined,  abstracted  and  organized  as  cases.  Like  CBR,  IBR  is  an 
analogical  approach  to  reasoning,  since  it  relies  on  finding  previous  instances  of 
similar  problems  and  uses  them  to  create  an  ensemble  of  local  models.  Hence 
the  definition  of  similarity  plays  a  critical  role  in  the  performance  of  IBR’s.  Typ¬ 
ically,  similarity  will  be  a  dynamic  concept  and  will  change  over  the  use  of  the 
IBR.  Therefore,  it  is  important  to  apply  learning  methodologies  to  define  and 
adapt  it.  Furthermore,  the  concept  of  similarity  is  not  crisply  defined,  creating 
the  need  to  allow  for  some  degree  of  vagueness  in  its  evaluation.  This  issue  was 
addressed  by  evolving  the  design  of  a  similarity  function  in  conjunction  with 
the  design  of  the  attribute  space  in  which  the  similarity  was  evaluated.  After 
developing  several  exploratory  peer-based  models,  a  fuzzy  instance-based  clas¬ 
sifier  (FIBC)  was  used  that  was  designed  by  an  evolutionary  search  (instead  of 
by  a  manual  process).  Specifically  the  following  steps  were  used: 

1.  Retrieval  of  similar  instances  from  the  Data  Base 

2.  Evaluation  of  similarity  measure  between  the  probe  and  the  retrieval  of 
instances 

3.  Creation  of  local  models  using  the  most  similar  instances  (weighted  by 
their  similarity  measures) 

4.  Aggregation  of  outputs  of  local  mode  to  probe 

It  should  be  noted  that  no  additional  sensors  were  used  for  this  experiment. 
All  data  came  from  on-board  sensors  used  by  the  control  systems  that  regulate 
the  various  subsystems  in  the  locomotive.  This  constraint  implies  that  it  will 
not  be  necessary  to  over-instrument  existing  or  new  platforms  to  re-apply  a 
similar  process  and  obtain  comparable  results. 

The  experimental  results  reveal  that  consulting  a  unit’s  peers,  the  Collective, 
provides  a  significant  increase  in  the  ability  to  characterize  the  behavior  of  that 
unit  in  terms  of  completing  the  next  mission.  The  peer-based  approach  is  robust 
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and  degrades  gracefully  with  the  information  loss  that  is  likely  to  be  present  in 
the  battlefield.  In  addition,  ’rules  of  thumb’  such  as  using  the  newest  units  on 
a  mission  were  actually  shown  to  be  damaging  rather  than  beneficial. 

With  this  limited  data,  the  use  of  Evolved  Peers  provided  the  best  overall 
accuracy  (60.35%  =  over  3  times  better  than  random  selection)  for  past  per¬ 
formance.  When  the  selection  was  limited  to  a  small  fixed  number  of  units, 
Evolved  Peers  provided  an  accuracy  of  63.5%  (over  10  times  better  than  ran¬ 
dom  selection)  for  the  past  performance.  Finally,  Evolved  Peers  provided  the 
best  overall  accuracy  (55%  =  2.7  times  better  than  random  selection  and  1.5  x 
better  than  best  heuristics)  for  future  performance. 

The  Collective  (peer-based)  approaches  have  shown  great  robustness  to  in¬ 
formation  loss.  This  will  enable  mission  reliability  for  minimally  instrumented 
platforms  operating  with  limited  bandwidth. 

The  experiment  showed  the  applicability  of  peer-based  learning  methodolo¬ 
gies  with  evolutionary  algorithms  to  select  the  best  attributes  for  representing 
peers  and  to  define  similarity  measures  for  identifying  the  most  similar  peers  for 
a  given  unit.  By  evolving  the  models  over  different  time  slices,  it  has  been  shown 
the  ability  to  dynamically  adapt  the  neighborhoods  of  peers  using  incremental 
operational  and  maintenance  data.  In  future  work,  structural  design  of  the  at¬ 
tribute  space  (for  the  definition  of  peers)  could  be  extended  by  using  genetic 
programming  in  lieu  of  evolutionary  algorithms,  and  attribute  selection  and 
weighting  to  attribute  construction.  The  fitness  function  to  tradeoff  classifier 
accuracy  and  confidence  could  be  improved  by  adding  measure  of  representation 
parsimony  and  find  Pareto  fronts  for  different  tradeoffs. 

Generating  more  sophisticated  local  models  for  predictions  could  also  extend 
the  approach.  The  present  assumption  was  that  each  peer  had  a  rather  ’’feeble” 
track-history,  which  motivated  the  peer  approach  to  begin  with.  In  situations 
where  the  peers  have  a  richer  track-history,  more  complex  models,  whose  pa¬ 
rameters  could  be  obtained  using  a  local  search  method,  could  be  developed.  In 
addition  to  the  aforementioned  technical  extensions,  one  or  more  experiments 
should  be  conducted  using  data  describing  equipment  usage  more  typical  of  mil¬ 
itary  operations.  For  example,  the  data  should  include  instances  of  irregular  use 
of  equipment,  equipment  use  in  diverse  environments  in  a  variety  of  missions, 
equipment  operating  conditions  that  range  from  none  to  little  usage,  normal 
operations  and  stressed  and  overload. 

5.5.2  Conclusion  of  the  Experiment 

The  GE  experiment  showed  significant  improvement  in  fleet  performance,  so 
much  so  that  GE  has  already  tested  some  of  the  ideas  in  commercial  practice. 
The  military  representatives  at  the  meeting  where  the  results  were  reported 
gave  the  work  a  very  good  report  stating  that  the  technology  far  exceeded  any 
technology  used  in  the  military  today.  The  follow  on  to  this  comment  has  been 
for  the  team,  Sondheimer,  Wallace  and  Will  to  continue  to  brief  the  personnel 
in  extensively  in  the  Pentagon  and  at  various  service  locations  on  the  potential 
benefits  of  research  on  the  Collective  Mind. 
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5.6  Technology  Transfer  to  the  Military 

A  key  imperative  from  DARPA  was  to  elicit  and  enlist  support  from  a  real 
military  customer. 

We  made  visits/presentations  to  the  following  organizations,  in  each  case 
explaining  our  concept,  listening  to  their  feedback  and  evolving  the  concept  to 
make  it  more  suitable  for  technology  transfer 

•  Department  of  Defense  Condition  Based  Maintenance  +  (OSD  CBM+) 

•  Department  of  Defense  Office  of  Force  Transformation  (OSD  OFT) 

•  US  Air  Force  Expeditionary  Logistics  for  the  21st  Century  (eLog21) 

•  US  Air  Force  Knowledge  Services  (AFKS) 

•  US  Air  Force  Research  Laboratory  (AFRL) 

•  US  Army  AMRDEC  US  Army  Future  Combat  Systems  (FCS) 

•  US  Army  Logistics  Transformation  Agency  (LTA) 

•  US  Army  Materiel  System  Analysis  Activity  (AMSAA) 

•  US  Army  Objective  Force 

•  US  Army  Research  Laboratory 

•  US  Army  RDECOM  SMS  IPT 

•  US  Army  TARDEC  Joint  Strike  Fighter  Program  Office 

•  US  Marine  Corp  Systems  Command  (MARCORSYSCOM) 

•  US  Navy  Office  of  Naval  Research 

A  public  presentation  was  made  to  all  services  at  the  2004  DoD  Maintenance 
Symposium  in  Houston,  Texas. 

Workshops  on 

We  also  conducted  four  Workshops  Self-Aware  Platforms  and  the  Collective 
Mind  in  January  23-24,  2003  at  Strategic  Analysis  Inc.  Arlington,  VA;  February 
13-14,  2003,  USC-ISI,  Marina  del  Rey,  CA;  January  13,  2004,  Erie,  NY;  April 
1-2,  2004,  Marina  del  Rey,  CA. 

5.7  Potential  Impact 

The  Collective  Mind  Workshops  showed  that  the  Collective  Mind  topic  was 
challenging  and  had  military  relevance  and  interest. 

The  Collective  Mind  Workshops  showed  that  the  Collective  Mind  topic  was 
challenging  and  had  academic  and  scientific  interest. 

The  conclusion  is  that  the  objective  of  the  DARPA  study  has  been  met; 
both  military  and  research  and  development  communities  endorse  the  concept. 
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Realization  the  full  benefits  of  the  concept  is  DARPA-hard.  A  DARPA 
program  could  stimulate  important  research  in  collective  learning  to  pursue  the 
above  and  other  emergent  applicable  ideas. 

Robustness:  now  is  the  time  to  challenge  learning  researchers  to  develop  ro¬ 
bust  engineering  methodologies  for  deploying  learning  systems.  Machine  learn¬ 
ing  has,  so  far,  taken  place  under  ’’laboratory  conditions”  where  PhD  researchers 
hand-craft  the  systems  to  make  them  work.  As  a  result,  while  machine  learning 
provides  a  revolutionary  new  method  for  constructing  intelligent  systems  (such 
as  handwriting  recognition,  speech  recognition,  and  artificially-intelligent  sim¬ 
ulated  characters  in  games),  the  resulting  systems  do  not  learn  after  they  are 
deployed. 

Scaling:  machine  learning  currently  does  not  scale.  Indeed,  even  human 
learning  takes  place  only  within  the  head  of  each  individual  person,  and  society 
spends  billions  of  dollars  to  combine  and  communicate  this  learned  knowledge  to 
other  people.  The  Collective  Mind  project  envisions  a  learning  technology  that 
is  able  to  rapidly  combine  knowledge  learned  separately  by  many  distributed 
agents  so  that  each  agent  can  become  a  ’’super  agent”  that  benefits  from  ev¬ 
erything  learned  by  the  other  agents.  This  might  allow  computers  to  learn  very 
rapidly  and  identify  patterns  that  people,  with  our  limited  ability  to  combine 
learned  knowledge,  cannot  detect. 

Military  system  must  learn  after  deployment.  This  is  the  key  to  making 
all  kinds  of  computer  systems  adaptive  to  the  needs  and  environments  of  their 
users.  Without  real-time  learning,  our  hand-crafted  intelligent  systems  will 
remain  brittle  and  hard-to-use. 

Machine  learning  is  currently  too  slow.  Most  systems  must  be  trained  on 
thousands  or  millions  of  training  trials.  Learning  can  be  much  faster  if  domain 
knowledge  is  available  to  guide  and  constrain  the  process.  Success  in  the  Collec¬ 
tive  Mind  project  will  produce  a  widely-applicable  knowledge-guided  learning 
technology. 

The  core  of  Transformation  in  the  Military  is  rapid  learning.  Introducing 
new  equipment  at  a  rapid  pace  demands  fast  learning.  DARPA  needs  to  step 
up  to  the  challenge.  It  is  DARPA-hard 

6  Discussion  of  Results  and  Approach 

The  mathematical  methods  used  to  analyze  collective  swarm  behavior  are  based 
on  viewing  individual  agents  as  stochastic  Markov  processes.  In  order  to  con¬ 
struct  a  description  of  the  behavior  of  a  swarm,  we  do  not  need  to  know  the 
exact  trajectories  of  every  agent;  instead,  we  derive  a  model  that  governs  the 
dynamics  of  the  aggregate,  or  average,  swarm  behavior. 

Mathematical  models  are  straightforward  to  construct  and  analyze  —  in  fact, 
they  can  be  easily  constructed  from  details  of  the  individual  robot  controller  [29] . 
The  ease  of  use  comes  at  a  price,  namely,  the  number  of  simplifying  assumptions 
that  were  made  in  order  to  produce  a  mathematically  tractable  model.  First, 
we  assume  that  the  robots  are  functioning  in  a  dilute  limit,  where  they  are 
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sufficiently  separated  that  their  actions  are  largely  independent  of  one  another. 
Second,  we  assume  that  the  transition  rates  can  be  represented  by  aggregate 
quantities  that  are  spatially  uniform  (unless  we  are  explicitly  modeling  interac¬ 
tions  with  external  fields)  and  independent  of  the  details  of  the  individual  robot’s 
actions  or  history.  We  also  assume  the  system  is  homogeneous,  with  modeled 
robots  characterized  by  a  set  of  parameters,  each  of  them  representing  the  mean 
value  of  some  real  robot  feature:  mean  speed,  mean  duration  for  performing  a 
certain  maneuver,  and  so  on.  Real  robot  systems  are  heterogeneous:  even  if  the 
robots  are  executing  the  same  controller,  there  will  always  be  variations  due  to 
inherent  differences  in  hardware.  We  do  not  consider  parameter  distributions 
in  our  models  as  would  be  necessary  to  describe  such  heterogeneous  systems. 
Finally,  mathematical  models  more  reliably  describe  systems  where  fluctuations 
(deviations  from  the  mean  behavior)  can  be  neglected,  as  happens  in  large  sys¬ 
tems  or  when  many  experimental  runs  are  aggregated.  However,  the  success  we 
achieved  in  quantitatively  predicting  and  explaining  results  of  experiments  and 
simulations  with  real  robots  give  us  confidence  for  the  validity  of  our  approach. 


7  Future  Directions 

We  have  successfully  demonstrated  that  mathematical  analysis  can  be  used  to 
describe  collective  behavior  of  large  multi-agent  systems,  including  real  robot 
systems.  Analysis  is  fundamental  to  building  predictable  and  verifiable  systems, 
two  aspects  critical  to  swarm  deployment. 

Our  work  also  has  significant  consequences  to  the  design  of  multi-agent  sys¬ 
tems.  In  fact,  our  vision  is  to  give  system  designers  tools  to  programmatically 
synthesize  and  optimize  adaptive  controllers  for  intelligent  agents  that  will  make 
up  intelligent  swarms. 
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Figure  22:  Design  cycle  of  Agent-based  computing  algorithms 

Our  vision  of  the  design  lifecycle  is  based  on  the  synthesis— > analysis— ^optimi¬ 
zation  loop  shown  in  Figure  22.  The  designer  specifies  behavior  of  an  individual 
agent  or  component:  its  task  requirements,  its  capabilities,  its  interactions  with 
other  agents,  as  well  as  its  response  to  environmental  stimuli  and  contingen¬ 
cies,  such  as  possible  actions  of  an  adversary.  We  then  apply  Machine  Learning 
techniques  to  learn  the  automaton  that  describes  the  agent  controller.  Once  we 
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have  the  controller,  we  use  the  mathematical  framework  we  developed  in  the 
course  of  the  TASK  program  to  model  the  behavior  of  an  ensemble  of  agents 
executing  this  controller.  The  evolution  of  the  collective  behavior  can  be  stud¬ 
ied  quantitatively,  and  results  of  analysis  used  to  guide  performance-enhancing 
modifications  in  the  controller. 

The  realization  of  this  vision  requires  further  development  of  our  mathemat¬ 
ical  framework  to  model  more  sophisticated  agent  behaviors,  such  as  learning 
from  experience,  learning  from  environment,  or  responding  to  other  agents’ 
modifications  of  the  environment.  It  will  also  require  the  development  of  tools 
to  synthesize  agent  controllers  from  their  specifications.  Our  initial  study  of 
automatic  synthesis  specifically  [17]  and  design  problem  in  general  [5]  shows 
this  to  be  a  promising  approach. 
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